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PREFACE TO SECOND IMPRESSION 



" It is all very well to say that the world of reality 
should be kept separate and distinct from the world of 
mathematics. In the trivial operations of daily life, 
you may be able to keep clear of the concepts of mathe- 
matics, but once you begin to touch science, the dan- 
gerous contact is established." 

Professor John L. Synge (Science — Sense and Nonsense) 

Of recent years, many people, often with little mathematical 
training, have had to teach themselves some Statistics. Few, 
probably, have found it easy, for, at least from one point of 
view, Statistics is a branch of applied mathematics, and, so, 
some mathematics is essential, even in an elementary treat- 
ment of the subject. 

One can, of course, try to learn some of the basic routines 
and tests as rules of thumb, hoping that whatever situation 
may arise will be a text-book one to which the appropriate 
text-book test will obligingly apply. The trouble is, however, 
that such situations are rare, and very soon one begins to 
wish that one had learned a little about the actual nature of 
the fundamental tests. 

It is the aim of this book to help those who have to teach 
themselves some Statistics to an understanding of some of the 
fundamental ideas and mathematics involved. Once that has 
been acquired, problems of application are the more readily and 
successfully tackled. For this reason, and for reasons of space, 
the concentration has been on fundamentals. But Statistics is 
a dynamic, developing science. New techniques, new methods 
of analysis are constantly arising and influencing even the 
foundations. The reader is urged to bear this fact in mind all 
the time and, particularly, when reading Chapters VI and IX. 

The standard of mathematics assumed is not high. Occasion- 
ally the algebra may appear to be a little tedious, but it is 
not difficult. Whenever possible, references to Mr. Abbott's 
books in this series, especially Teach Yourself Calculus, have 
been given, and the reader is strongly advised to follow them 
up. Where this has not been possible and new ground has 
been broken, a note has been added at the end of the appro- 
priate chapter. Continuous bivariate distributions, including 
the normal bivariate distribution, which involve double inte- 
grals, have been treated in an appendix. In case notation 
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should present difficulties, a list of a few mathematical symbols 
and their meanings follows that appendix. A set of Exercises 
concludes each chapter, except the first, but the student is 
urged also to tackle those provided in some of the books listed 
on page 239. Specially important as, perhaps, the most 
useful collection of exercises at present available is that pro- 
vided by the two parts of Elementary Statistical Exercises 
issued, and obtainable from, the Department of Statistics, 
University College, London. To be adequately equipped to 
tackle such exercises the student is recommended to have by 
him : 

(1) Chambers's Shorter Six-Figure Mathematical Tables by 
the late Dr. L. J. Comric (\V. and R. Chambers); and 

(2) Cambridge Elementary Statistical Tables, by D. V. 
Lindley and J. C. P. Miller (C.U.P.). 

A desk calculator is not essential, but if the student can possibly 
obtain one he should certainly do so. Of the hand-operated 
models, the Madas 10R is recommended. 

Lastly, to the staff of the English Universities Press and to 
the printers I wish to express my appreciation of the care 
they have bestowed upon this book; to my sister, Miss Nancy 
Goodman, to Messrs. F. T. Chaffer, Alec Bishop, and Leonard 
Cutts, and to the late Dr. J. Wishart, my thanks are due for 
their encouragement and suggestions; while to all those who 
have drawn my attention to mistakes and errors, especially 
Dr. P. G. Moore, the Rev. Liam Grimley and Dr. van de Geer. 
I express my gratitude; wherever possible the necessary cor- 
rections have been made. 

R. G. 

BRIGHTON, 
i960. 
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INTRODUCTORY: A FIRST LOOK AROUND 

1.1. Statistics and Statistics. Most of us have some idea of 
what the word statistics means. We should probably say that 
it has something to do with tables of figures, diagrams and 
graphs in economic and scientific publications, with the cost of 
living, with public-opinion polls, life insurance, football pools, 
cricket averages, population-census returns. " intelligence *' 
tests, with production planning and quality control in industry 
and with a host of other seemingly unrelated matters of concern 
or unconcern. We might even point out that there seem to 
be at least two uses of the word : 

the plural use, when the word denotes some systematic 
collection of numerical data about some topic or topics ; 

the singular use, when the word denotes a somewhat 
specialised human activity concerned with the collection, 
ordering, analysis and interpretation of such data, and 
with the general principles involved in this activity. 

Our answer would be on the right lines. Nor should we be 
unduly upset if. to start with, we seem a little vague. Statis- 
ticians themselvesdisagree about the definition of the word : over 
a hundred definitions have been listed (W. F. Willcox, Revue 
de I'lnstitut International de Statistique, vol. 3. p. 288, 1935), 
and there are many others. One of the greatest of British 
statisticians, M. O. Kendall, has given his definition as follows : 

" Statistics is the branch of scientific method which deals 
with the data obtained by counting or measuring the pro- 
perties of populations of natural phenomena. In this 
definition ' natural phenomena ' includes all the happen- 
ings of the external world, whether human or not " 
(Advanced Theory of Statistics, vol. 1. p. 2). 

Statistics, as a science, is, however, not merely descriptive ; 
like all sciences, it is concerned with action. In his 1952 
Presidential Address to the American Statistical Association, 
A. J. Wickens remarked : 

" Statistics of a sort can, of course, be traced back to 
ancient times, but they have flowered since the industrial 
ii 
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revolution. Beginning in the 19th century, statistical 
records were developed to describe the society of that era, 
and to throw light on its economic and social problems. 
No doubt they influenced the course of men's thinking 
then, and even, in some instances, may have led to new 
policies and new laws; but primarily their uses were 
descriptive. Increasingly, in the 20th century, and 
especially since World War I, statistics have been used to 
settle problems, and to determine courses of action. In 
private enterprise, quality control tests now change the 
production lines of industrial enterprises. New products 
are developed and tested by statistical means. Scientific 
experiments turn upon statistics. . . ." (" Statistics and 
the Public Interest," Journal of the American Statistical 
Association, vol. 48, No. 201, March 1953, pp. 1-2). 

Two other U.S. writers, F. C. Mills and C. D. Long, have 
stressed : 

" In high degree the emphasis in the work of the 
statistician has shifted from this backward-looking pro- 
cess to current affairs and to proposed future operations 
and their consequences. Experiments are designed, 
samples selected, statistics collected and analysed witli 
reference to decisions that must be made, controls that 
must be exercised, judgments that entail action "(" Statis- 
tical Agencies of the Federal Government, 1948 ". quoted 
by S. S. Wilks in " Undergraduate Statistical Education ", 
Journal of the American Statistical Association, vol. 40 
No, 263, March 1951). 

I-et us suppose that you, the reader, have been appointed to 
be one of the M.C.C. Selectors responsible for picking an English 
cricket team to tour Australia. Clearly, it would be necessary 
to start by collecting information about the play of a group of 
" possibles ". (For the moment, we shall not consider how we 
have chosen these.) We might begin by noting down each 
man's score in successive innings and by collecting bowling 
figures. Ultimately, our collection of figures would tell us 
quite a lot, though by no means everything, about the batsmen 
and bowlers, as batsmen and bowlers, on our list. The 
sequence of numbers set down against each batsman's name 
would tell us something about his run-scoring ability. It 
would not, however, tell us much, if anything, about his style— 
that, for instance, on one occasion, he scored a boundary with 
a superlative cover-drive, while, on another, although he 
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scored a boundary, his stroke was certainly not superlative. 
The list of number-pairs (5 wickets, 03 runs, for example) 
against a bowler's name would, likewise, tell us something about 
his bowling over a certain period. Such lists of classified 
numbers are the raw stuff of statistics. 

Let x be the number of runs scored by a particular batsman 
in a given innings of a given season. Then x is a variable which 
can take any positive integral value, including zero. More 
precisely, x is a discrete or discontinuous variable, because the 
smallest, non-zero, difference between any two possible values 
of x is a finite amount (1 in this case). Not all variables are, 
however, of this kind. Let, now, x denote the average number 
of runs scored off a particular bowler per wicket taken by him 
on a given occasion his side fielded in a specified season. Then 
x may take on values like 12 0, 27-897. 3-333333 .... etc. 
Theoretically x can range over the entire set of positive rational 
numbers, 1 not merely the positive integers, from zero to 
infinity (0 wickets, 41 runs). Again, let x denote the number 
of yards run by a particular fielder on a given occasion in the 
field ; now x can take on any positive real number s as value, 
not merely any positive rational (for instance, if a fielder walks 
round a circle of radius 1 yard, he traverses 2* yards, and n, 
while being a real number, is not rational). Thus the variable 
can vary continuously in value and is called a continuous variable. 

Any aspect of whatever it may be we are interested in that is 
countable or measurable can be expressed numerically, and, so, 
may be represented by a variable taking on values from a 
range of values. It may be. however, that we are primarily 
interested in a batsman's style rather than his score. Assume, 
then, that, together with our fellow Selectors, we have agreed 
upon some standard enabling us to label any particular stroke 
first-class and not -first-class. Although, in the common-sense 
use of measure, such a difference is not measurable, we can, 
nevertheless, assign the number 1 to a first-class stroke and the 
number 0 to one that is not. Any particular innings of a given 
batsman would then be described, from this point of view, by a 
sequence of O's and 1 's, like 000 1 00 1 1 1 0 1 1 1 1 0 1 0 1 1 1 1 1 0000000. 
This, too, would constitute a set of statistical observations. 

1 A rational number is any number that can be expressed in 
the form rjs, where r and s are integers and s is positive. 

* A real number is any number that can be expressed as a 
terminating or unterminating decimal. Thus any rational number 
is a real number, but not all real numbers are rational. See T. 
Dantzig, Number, the Language of Science, or G. H. Hardy, A Course 
of Pure Mathematics. 
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When we have gathered together a large collection of numeri- 
cal data about our players, we must begin to " sort them out ". 

Our statistics (in the plural), the numerical data collected in 
our field of interest, requires statistical (in the singular) treat- 
ment (ordering, tabulating, summarising, etc.). Indeed, in the 
very process of collecting we have already behaved statistically 
(in the singular), for we shall have been forced to develop some 
system to avoid utter chaos. 

Thus the systematic collection of numerical data about a set 
of objects, for a particular purpose, the systematic collection of 
statistics (plural) is the first phase of Statistics (singular). 

(Warning : Thus far, we have been using the term statistics 
(plural) in the sense in which it is often used in everyday con- 
versation. Actually, among statisticians themselves, this use. 
to denote a set of quantitatively measured data, is almost 
obsolete. The plural use of statistics is confined to the plural 
of the word statistic, a rather technical word we shall explain 
later.) 

1 .2. Descriptive Statistics. How do we go about sorting out 
our data ? 

First, we display them so that the salient features of the 
collection arc quicklydiscernible. This involves tabulating them 
according to certain convenient and well-established principles, 
and, maybe, simultaneously presenting them in some simple, 
unambiguous diagrammatic or graphical form. 

Secondly, we summarise the information contained in the 
data, so ordered and displayed, by a single number or, more 
usually, a set of numbers (e.g., for a batsman, number of 
innings, times not out, total number of runs scored, highest 
score, average number of runs scored per completed innings). 

Which of these " summarising numbers " we use depends 
upon the question we are trying to answer about the subject 
to which the data relate. We might want to compare the 
average (arithmetic mean score) of each one of our " possible " 
batsmen with the overall average of all the batsmen in 
the group. In an attempt to assess consistency, we might 
try to arrive at some figure, in addition to the mean score, 
which would tell us how a batsman's scores in different innings 
arc spread about his mean. In doing such things we should be 
engaged in descriptive statistics — ordering and summarising a 
given set of numerical data, without direct reference to any 
inferences that may be drawn therefrom. 

1.3. Samples and Populations. The data we have collected, 
ordered, displayed and summarised will tell us much that we 
want to know about our group of possible Test players, and 
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may help us to pick the touring party. Indirectly, they will 
also tell us something about the state of first-class cricket in 
England as a whole. But because the group of " possibles " 
we have been studying is a special group selected from the entire 
population, as we call it, of first-class cricketers, the picture we 
obtain from our group, or sample, will not be truly representa- 
tive of that population. To obtain a more nearly representative 
picture, we should have picked a sample, or. better still, a 
number of samples, at random from the population of all 
first-class players. In this way we meet with one of the 
central ideas of the science of statistics — that of sampling a 
population. 

Ideally, of course, to obtain a really reliable picture we should 
need alf the scores, bowling averages, etc., of every player in 
every first-class county. But this would be. if not theoretically 
impossible, certainly impracticable. So we arc forced back to 
the idea of drawing conclusions about the population from the 
information presented in samples — a procedure known as 
statistical inference. 

To fix our ideas, let us set down some definitions : 

Population (or Universe) : the total set of items (actual or 
possible) defined by some characteristic of those items, 
e.g.. the population of all first-class cricketers in the year 
195(1 ; the population of all actual and possible measure- 
ments of the length of a given rod ; the population of all 
possible selections of three cards from a pack of 52. A 
population, note, need not be a population of what, in 
everyday language, we call individuals. Statistically, 
we speak of a population of scores or of lengths. Such a 
population may have a finite number of elements, or may 
be so large that the number of its elements will always 
exceed any number, no matter how large, we may choose ; 
in this latter case, we call the population infinite. 

Sample : Any finite set of items drawn from a population. 
In the case of a finite population, the whole population 
may be a sample of itself, but in the case of an infinite 
population this is impossible. 

Random Sample : A sample from a given population, each 
element of which has an "equal chance" of being 
drawn. Let us say at once that this definition is opened 
to serious objection, for when we think about what 
exactly we mean by ' ' equal chance ", we begin to suspect 
that it, in its turn, may involve the very idea of random- 
ness we are trying to define. 



i6 



STATISTICS 



There are various methods by which we may obtain a 
random sample from a given population. The common 
method of drawing names from a hat occurs to us at once. 
This suffices to emphasise the very important point that — 

the adjective random actually qualifies the method of select- 
ing the sample items from the population, rather than 
designating some property of the aggregate of elements of 
the sample discovered after the sample has been drawn. 

That section of Statistics concerned with methods of drawing 
samples from populations for statistical inference is called 
Sampling Statistics. 

Switching to a new field, assume that from all National 
Servicemen born in a certain year. 1933, we draw a random 
sample or a number of samples, of 200. What can we infer 
about the distribution of height in the population from that in 
the sample ? And how accurate will any such inference be ? 

We repeat that unless we examine all the elements of a 
population wc cannot be certain that any conclusion about the 
population, based on the sample-data, will be 100% accurate. 
We need not worry about this. The great mass of our know- 
ledge is probability-knowledge. We are "absolutely certain" 
that a statement is " 100% true " only in the case of statements 
of a rather restricted kind, like : / was born after my father was 
bom, a black cat is black, one plus one equals two. Such state- 
ments, called tautologies by logicians, are sometimes of con- 
siderable interest because, like the third example given, they 
are in fact disguised definitions ; but none is a statement about 
the world, although at first sight it may appear to be so. We 
may be " pretty confident " that any statement saying some- 
thing remotely significant about the world is a probability- 
statement. It is for this reason, among others, that Statistics 
is important : 

" The characteristic which distinguishes the present-day 
professional statistician, is his interest and skill in the 
measurement of the fallibility of conclusions " (G. W. 
Snedecor, " On a Unique Feature of Statistics ", Presiden- 
tial Address to the American Statistical Association. 
December 1948, Journal of the American Statistical 
Association, vol. 44, No. 245, March 1949). 

Central, therefore, in this problem of inference from sample 
to population, statistical inference, is the concept of probability. 
Indeed, probability theory is the foundation of all statistical 
theory that is not purely descriptive. Unfortunately it is not 
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possible to give a simple, universally acceptable definition of 
probability. In the next chapter wc shall try to clarify our 
ideas a little. For the moment, we assume that we know 
roughly what the word means. 

In a sample of 200 of all National Servicemen born in 1933. 
we shall find that there are so many with heights of 59 in. and- 
under, so many with heights exceeding 50 in. but not exceeding 
B0 in. and so on. Our variable here is height in inches of National 
Semicoma* bom in 1933. It is distributed over the 200-strong 
sample in a definite manner, the n umber of values of the variable 
falling within a specified interval being called the frequency of 
the variable in that interval. We may thus set up a table 
giving the frequency of the variable in each interval for all the 
intervals into which the entire range of the variable is divided. 
We thereby obtain the frequency-distribution of the variable 
in the sample. 

We now define : 

Variate : A variable possessing a frequency distribution is 

usually called a variate by statisticians (more precise 

definition is given in 2.13). 
Sample Statistic : A number characterising some aspect of 

the sample distribution of a variate, e.g., sample mean, 

sample range. 

Population Parameter : A number characterising some 
aspect of the distribution of a variate in a population, 
e.g., population mean, population range. 

Using these new terms, we may now formulate the question 
raised above in this way : 

How can we obtain estimates of population parameters 
from satnpL statistics, and how accurate will such estimates 
be? 

1.4. Statistical Models, Statistical Distributions. W hen we 

examine actual populations we find that the variate tends to be 
approximately distributed over the population in a relatively 
small number of ways. Corresponding to each of these ways, 
we set up an ideal distribution which will serve as a model for 
the type. These are our standard distributions (just as the 
equation ax 2 + bx + c = 0 is the standard quadratic equation, 
to which we refer in the course of solving actual quadratic 
equations). Each is defined by means of a mathematical 
function called a frequency -f "unction (or, in the case of a distribu- 
tion of a continuous variate, a probabilily-density-funclion) in 
which the population parameters appear as parameters, i.e., 
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as controllable constants in the function, which, when varied, 
serve to distinguish the various specific cases of the general 
functional form. (In the function f(x, a. b, c) m ax* + bx + c, 
the constants a, b, c are parameters : when we give them 
different values, we obtain different cases of the same functional 
form.) 

Once wc have defined a standard distribution, we can 
sample theoretically the corresponding ideal population. It is 
then frequently possible to work out the manner in which any 
specific statistic will vary, as a result of the random-sampling 
process, with the size of the sample drawn. In other words, 
we obtain Anew distribution which tells us the manner in which 
the statistic in question varies over the set of all possible samples 
from the parent population. Such distributions we call 
Sampling Distributions. Thus, providing our model is appropri- 
ate, wc are able to decide how best to estimate a parameter 
from the sample data and how to assess the accuracy of such 
an estimate. For example, we shall find that the mean value 
of the mean values of all possible samples from a population is 
itself the mean value of the variate in the population, and that 
the mean of a particular sample has only a specific probability 
of differing by more than a stated amount from that value. 

In this way, we are led to the idea of statistics or functions of 
statistics as estimators of population parameters, and to the 
closely related idea of confidence limits — limits, established from 
the sample data with the help of our knowledge of some model 
distribution, outside which a particular parameter will lie with 
a probability of only " one in so many ". 

1.5. Tests of Significance. Very often we are not primarily 
concerned with the values of population parameters as such. 
Instead, we may want to decide whether or not a certain 
assumption about a population is likely to be untenable in the 
light of the evidence provided by a sample or set of samples 
from that population. This is a very common type of problem 
occurring in very many different forms : 

Is it reasonable to assume, on the basis of the data pro- 
vided by certain samples, that a certain modification to a 
process of manufacturing electric-light bulbs will effec- 
tively reduce the percentage of defectives by 10%? 

Samples of the eggs of the common tern are taken from 
two widely separated nesting sites : is it reasonable to 
assume, on the evidence of these samples, that there is no 
difference between the mean lengths of the eggs laitl by 
birds in the two localities? 
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Consider the light-bulb problem. Previous sampling may 
have led us to suspect that the number of defective light bulbs 
in a given unit of output, 10,000 bulbs, say, is too high for the 
good name of the firm. Technical experts have suggested a 
certain modification to the production process. This modifica- 
tion has been agreed upon, introduced and production has been 
resumed. Has the modification been successful? 

The assumption that the modification has reduced the 
number of defectives by 10% is a hypothesis about the dis- 
tribution of defectives in the population of bulbs. If the 
hypothesis is true, we may conclude, from our knowledge of 
the appropriate model distribution, that, if wc draw from each 
10,000 bulbs produced, a random sample of, say, 100 bulbs, the 
probability of obtaining 5 or more defectives in such a sample 
will be I in 20, i.e., we should expect not more than 1 sample in 
every 20 to contain 5 or more defectives : if, then, we find that 
3 samples, say, in 20 contain 5 or more defectives, an event 
improbable at the level of probability chosen (in this case or 
0 05) has occurred. We shall have reason, therefore, to 
suspect our hypothesis — that the modification has reduced the 
number of defectives by 10%. 

Now the level of probability chosen is essentially arbitrary. Wc 
might well have made the test more exacting and asked of our 
model what number of defectives in a sample of the size 
specified is likely to be attained or exceeded with a probability 
of 0 01. In other words, if our hypothesis is true, what is the 
value of m such that we should expect only 1 sample in 100 to 
contain m or more defectives ? Had we chosen this, 0 01, level 
and found that more than 1 in 100 samples contained this 
number of defectives or more, a very improbable event would 
have occurred. We should then be justified in very strongly 
suspecting the hypothesis. 

Such tests are tests ok significance. The hypothesis to be 
tested is a — 

Null Hypothesis, because it is to be nullified if the evidence 
of random sampling from the population specified by the 
hypothesis is " unfavourable " to that hypothesis. 

We decide what shall be considered " unfavourable " or " not 
unfavourable " by choosing a level of significance (level of prob- 
ability). It is up to us to fix the dividing line between " un- 
favourable " and " not unfavourable ". In practice, the levels 
chosen are frequently the 0 05 and 0 01 levels. But any level 
may be chosen. 

In all this, however, there is need to remember that it is 
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always dangerous to rely too much on the evidence of a single 
experiment, test or sample. For, as we shall see, although 
each test in a series of tests may yield a non-significant result, 
it is quite possible that, when we pool the data of each test, and 
subject this pooled data to the test, the result may actually be 
significant at the level selected. Again, although a significant 
result is evidence for suspecting the hypothesis under test, a 
non-significant result is no evidence of its trutli : the alternative 
is not one between " Guilty " and " Not guilty ". " False " or 
" True ", but, rather, between " Guilty " and Not proven ", 
" False " and " Not disproved ". 

Problems such as these we have rapidly reviewed are, then, a 
few of the many with which statistics is concerned. Statistics 
is a tool, a very powerful tool, in the hands of the scientist, 
technologist, economist and all who are confronted with the 
job of taking decisions on the basis of probability statements. 
But any tool is limited in its uses, and all tools may be misused. 
The more we know about a tool, of the principles underlying its 
operation, the better equipped we are both to employ it 
effectively and to detect occasions of its misuse and abuse. 
Here we attempt to make a small start on the job of under- 
standing. 
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FREQUENCIES AND PROBABILITIES 

2.1. Frequency Tables, Histograms and Frequency Polygons. 

1 take six pennies, drop them into a bag, shake them well and 
empty the bag so that the pennies fall on to the table. I note 
the number of heads shown, and repeat the experiment. When 
I have carried out the routine 200 times, I make out the follow- 
ing table showing upon how many occasions out of the 200 all 
the pennies showed tails (no heads), one out of the six pennies 
showed a head, two pennies showed heads and so on : 



Number of Heads (//) . 


0 1 2 3 4 5 6 


Total 


Frequency (/) 


2 19 46 62 47 20 4 


200 



Such a table summarises the result of the experiment, and is 
called a Frequency Table. It tells at a glance the number of 
times (frequency) a variable quantity, called a variate (in this 
case H, the number of heads shown in a single emptying of the 
bag), takes a specified value in a given total number of occasions 
(the total frequency : here, 200). In this case the variate is 
discontinuous or discrete, being capable of taking certain values 
only in the range of its variation. But not all variates are of 
this kind. Consider the following frequency table showing the 
distribution of length of 200 metal bars : 



Length {L) 


30 31 32 33 34 35 36 37 38 39 


Total 


Frequency (/) . 


4 8 23 35 62 44 18 4 1 1 


200 



Here the lengths have been measured correct to the nearest 
inch, and all bars having lengths in the range 34-5000 ... to 
35-4999 . . . are included in the 35-in. class. In other words, 
the variate (L) here could have taken any value between say 
29-50000 . . . and 39-4999 . . . inches. The variate is, in 
other words, a continuous variate, but for convenience and 
because no measurement is ever " exact ", the frequencies have 
been grouped into classes corresponding to equal subranges of 
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the variate and labelled with the value of the mid-point of the 
class interval. But although the variate. /-, could have taken 
any value in its range, in practice a distribution of observed 
frequencies only covers a finite number of values of the variate, 
although this may at times be very large. Thus in a sense, this 



FREQUENCY POLYGON 




2 3 4 5 6 
NUMBER OF HEADS 

Fig. 2.1.1 (a). — frequency Diagram. 



second distribution may also be regarded as a discrete distribu- 
tion, especially as the frequencies have been grouped. 

How do we display such distributions diagrammatically ? 

(a) The Histogram. On a horizontal axis mark out a number 
of intervals, usually of equal length, corresponding to the values 
taken by the variate. The mid-point of each such interval is 
labelled with the value of the variate to which it corresponds. 
Then, upon each interval as base, erect a rectangle the area of 
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which is proportional to the frequency of occurrence of that 
particular value of the variate. In this way we obtain a diagram 
built up of cells, called, from the Greek for cell, a histogram. 

The area of each cell, we emphasise, measures the frequency of 
occurrence of the variate in tlie interval upon which it is based. 



62 



55 



23 



44 



18 



1 



50 51 52 33 34 35 36 37 38 39 
BAR LENGTH IN INCHES (to nearest inch) 



Fig. 2.1.1 (ft).— Histogram. 



Of course, if, as is often the case, all the intervals are of equal 
length, the height of each cell serves as a measure of the corre- 
sponding frequency ; but this is, as it were, only accidental. 

The area of all the cells taken together measures the total 
frequency. 

Figs. 2.1.1 (fi) and (6) show the histograms for our two 
distributions. It should be noted that, whereas in the case of 
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the first distribution, when the variate was really discrete, 
each class-interval represents a single value, in the case of 
the second distribution, each class value represents in fact a 
range of values, the mid-point of which is used to denote the 
interval. 

(b) Frequency Polygon. Alternatively, at the mid-point of 
each interval, erect an ordinate proportional in length to the 
frequency of the variate in that interval. Now join together by 
straight-line segments the upper terminal points of neighbour- 
ing ordinates. The figure so obtained is a frequency polygon 
(see Fig. 2.1.1 (a)). 

2.2. Cumulative Frequency Diagrams. We may be interested 
more in the frequency with which a variate takes values equal 
to or less than some staled value rather than in the frequency 
with which it takes individual values. Thus, for example, we 
may want to show diagrammatically the frequencies with 
which our six pennies showed three heads or less or five heads 
or less. To do this we set up a cumulative frequency diagram in 
either of the two following ways : 

(a) On each interval corresponding to the different values of 
the variate, set up rectangles in area proportional to the 
combined frequency of the variate in that interval and in 
all those corresponding to lower values of the variate. 
Thus, in our pennies example, on the " 0 " interval, 
set up a rectangle of area 2, on the " 1 " interval, a 
rectangle of area 2 + 19 = 21, on the interval " 2 " a 
rectangle of area 2 + 19 + 40 m 07 and so on. The area 
of the rectangle set up on the last interval will measure 
the total frequency of the distribution. The diagram 
shown in Fig. 2.2.1 results. 

(6) Alternatively, at the mid-point of each interval, erect 
an ordinate measuring the " accumulated " frequency 
up to and including that value of the variate. Join the 
upper end-points of neighbouring ordinates. The 
resulting figure is a cumulative frequency polvgon (see 
Fig. 2.2.1). 

Exercise : Draw a cumulative frequency histogram and polygon for 
the data given in the second table m 2.1. 

2.3. Samples and Statistics. Theoretically, the experiment 
with the six pennies could have been continued indefinitely. 
Consequently the actual results obtained may be regarded as 
those of but one sample of 200 throws from an indefinitely 
large population of samples of that size. Had we performed 
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the experiment again, emptying the bag another 200 times, we 
should have obtained a somewhat different frequency distribu- 
tion, that of another sample of 200 throws. Had we made 800 




0 1 2 3 4 5 6 
NUMBER OF HEADS 



Fig. 2.2.1.— Cumulative Frequency Diagram. 

throws, we should have obtained yet another distribution, 
this time of a sample of 350. 

We therefore require some method of describing frequency 
distributions in a concentrated fashion, some method of sum- 
marising their salient features by, say, a set of " descriptive 
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numbers ". As we said in Chapter One we call these " descrip- 
tive numbers " statistics when they describe the frequency 
distribution exhibited by a sample, and parameters when they 
describe that of a population. For the present, we concern 
ourselves with samples only. 
2.4. Mode and Median. 

The MODE of a frequency distribution is that value of the 
variate for which the frequency is a maximum. 

The mode of the first distribution in 2.1 is H — 3 and the 
modal frequency is 62. The mode of the second distribution is 
/. = 34 in. and the modal frequency 02. Many distributions 
are unimodal, that is there is only one value of the variate in its 
total range for which the frequency is a maximum. But there 
are distributions showing two or more modes (climodal, multi- 
modal distributions). 

The MEDIAN is that value of the variate which divides 
the total frequency in the whole range into two equal 
parts. 

In our coin-throwing experiment both the 100th and 101st 
throws, when the throws are considered as arranged in order of 
magnitude of variate-value, fall in the " 3 " class. Since the 
variate can only take integral values between 0 and (i, the 
median value is 3 and the class H m 3 is the median class. On 
the other hand, in the bar-length distribution the frequencies 
are grouped into classes, but it is possible for a bar to have any 
length in the range. If we think of the bars as arranged in 
order of increasing length, the 100th and 101st bars fall in the 
34-m. group, and /. 34 is accordingly the median value. 
Suppose, however, the 100th bar had fallen in the 34-in. group 
and the 101st bar in the 35-in. group, we should say that the 
median value was between 34 and 86 in. On the other hand, 
had our total frequency been 201, the 101st bar length would 
be the median value, and the median group could easily have 
been found. 

To find an approximation to the median length, suppose that 
the median group is the 34-in. group (33-5-34-5 in.), that the 
cumulative frequency up to and including the 33-in. group is 
98, and that the frequency of the 34-in. group is 14. Let the 
median value be /.,„. Then the difference between the median 
value and that of the lower endpoint of the 34-in. interval will 
be L m — 33-5. The median cumulative frequency is 100 ; the 
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difference between this and the cumulative frequency at the 
lower end-point is 100 — 98. while the frequency in the 34-in. 
class is 14. Consequently it is reasonable to write 
L m - 33-5 = 100 - 98 
34-5 - 33-5 14 
or L m = 33-5 4- 0 143 = 33-6, correct to 1 d.p. 

Alternatively, the median value may be estimated graphically 
by using the cumulative frequency polygon for the distribution. 
The last ordinate in such a diagram measures the total frequency 
of the distribution. Divide this ordinate into 100 equal parts 
and label them accordingly. Through the 50th division draw 
a horizontal line to meet the polygon ; then through this point 
of intersection draw a vertical line to cut the axis of variate 
values. The point where it cuts this axis gives an approxima- 
tion to the median value. 

Exercise : l-ind the median value for the distribution whose frequency 
polygon was drawn as an exercise in 2.2. 

2.5. The Mean. More important than cither of the two 
preceding statistics is the arithmetic mean, or briefly, the mean 
of the distribution. 

If the variate x takes the values x< with frequencies /* 
respectively (i - i, 2, 8 . . . k), where I|j = N, the total fre- 
quency, the mean, x, is defined by 

Nx= 2 fiXi . • • • (2.5.1) 
<- 1 

Thus the mean number of heads shown in our penny-distribu- 
tion is given by 

200/7 = 2 x 0 + 19 x1+ 46 x 2 + 62 x 3 + 

47 X 4 + 20 X 6 + 4 X 6 = 609 

or H = 3045 



Frequently much arithmetic may be avoided by using a 
working mean. We use this method to calculate the mean of 
our second distribution in 2.1. 

Examining the frequencv tabic, we see that the mean will 
lie somewhere in the region of 34 in. Consequently, let 
x = L - 34. Then 2/,/. f = S/<(*( + 34) = + 34S/,or. 
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dividing by £/ ( - 200. I = x + 34. We therefore set up 
the following table : 



L. 


/■ 


x = L - 34. 


fx. 


ou 


4 


— 4 


— 16 


11 

Ol 


a 

n 


—3 


— 24 


M 
«>- 


23 


_2 


- 46 


33 


35 


-1 


- 35 


34 


62 


0 


-121 


35 


44 


1 


44 


3B 


IS 


2 


36 


37 


4 


3 


12 


38 


1 


4 


4 


30 


1 


5 


5 




200 




101 



.'. 2/1 - -20; x = -20/200 = -0 1 
or L = x + 34 ■ 33J) 



The reader will notice that, since in the histogram of a 
distribution, the frequency /, in the * class is represented bv 
the area of the corresponding rectangular cell, Nx » 2 ffiCi is 

the first moment of the area of the histogram about x = 0.' Con- 
sequently, the mean of a distribution, .v, is the abscissa or 
A-co-ordinate. of the centroid of the area of the histogram (see 
Abbott, Teach Yourself Calculus. Chapter XVTI). 

2.6. Measures of Spread. Two distributions may have the 
same mean value, same mode and same median, but'differ from 
each other according as the values of the variatc cluster closely 
around the mean or are spread widely on either side. In 
addition, therefore, to statistics of position or of central 
tendency— mean, mode and median— we require additional 
statistics to measure the degree to which the sample values of 
the variatc cluster about their mean or spread from it. The 
range, the difference between the greatest and least values 
taken by the variate. is, of course, such a statistic, but two 
distributions having the same mean and range may yet differ 
radically in their " spread ". 

If we refer back to the method of finding the median graphic- 
ally from a cumulative frequency polygon, we see that this 
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method can also be used to find the value of the variate below 
which any given percentage of the distribution lies. Such 
values are called percentiles. The ^-percentile being that 
value of the variate which divides the total frequency in the 
ratio p : 100-p. Thus the median is the 50th percentile. To- 
gether, the 25th percentile, the median and the 75th percentile 
quarter the distribution. They are, therefore, known as 
quartiles. The difference between the 75th and 25th per- 
centile is the inter-quartile range. This and the semi-inter- 
quartile range are often useful as practical measures of spread, 
for the smaller the iuter-quartile range the more closely the 
distribution clusters about the median. 

More important theoretically as a measure of spread, is a sta- 
tistic called the variance. Using the notation in which we de- 
fined the mean, the variance, s s , is given by 

Ns 2 = 2 fdx, -x)* ■ . ■ (2.6.1) 
i- i 

In words, it is the mean squared deviation from the mean of 
the sample values of the variatc. If we think in terms of 
moments of area of the histogram of the distribution, W is 
the second moment of area about the vertical axis through the 
centroid of the histogram. Thus s, commonly called the 
standard deviation of the distribution (it is the root mean square 
deviation), corresponds to the radius of gyration of the histo- 
gram about this axis (see Abbott, Teach Yourself Calculus. 
Chapter XVII). 

Expanding the right-hand side of (2.6.1). 

Ns* = i fi( Xi * - ISxi = 2 ftxf - 2i £ fiX, + NX*. 
i-\ i-l i-i 

But 

t 

Nx = 2 fan and, therefore. 
■ -1 

Ns* = L fix? - IMP + N0 = ( £ fixA - N£* 
i-i \i-i / 

or S '= ( ifaWtt) -J» • ■ ■ (2.6.2) 

This is, generally, a more convenient expression to use, when 
calculating the variance of a frequency-distribution, than 
(2.6.1). 
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Now let m be any value of the variate. We have 

t t 
W = 2 fi(xi - x)* = 2 fi(xi - m - (x - m)Y 
i—i i-i 
" t 
= 2 fi(xt - »»)> - 2{x - w) 2 /,(a-j - m) + A r (* - mi) 3 
*- 1 (-1 
I 

= £ - »»)* - N{x - m)» 

or 2 /,(.<•,- - ro)» = A's» + A'(* - in) 3 (2.6.2(a)) 

i — l 

This equation shows how we may calculate the variance 
using a working mean, m. It also shows that the sum of the 
squared deviations from the true mean is always less than that 
of the squared deviations from any other value. It is, in fact, 
the analogue of the so-called Parallel Axis Theorem for Mo- 
ments of Inertia (sec Abbott, Teach Yourself Calculus, p. 313). 

We now calculate the variance of the distribution of bar- 
lengths, the mean of which we have already found. We 
extend the table set out in 2-5 as follows : 

Distribution of bar-lcnqths in 200 bars 
(Working mean = 34 in.) 



L. 


/• 


x = L-34. 


x: 


fx. 


fx'. 


30 


4 


-4 


16 


-16 


64 


31 


8 


-3 


0 


-24 


72 


32 


23 


-2 


4 


-46 


92 


33 


35 


-1 


1 


-35 


35 


34 


62 


0 


0 


-121 


0 


35 


44 


1 


1 


44 


41 


3H 


18 


o 


4 


36 


72 


37 


4 


3 


» 


12 


36 


38 


1 


4 


16 


4 


16 


30 


1 


5 


25 


5 


26 




200 N 






101 


Zfifl — 456 I 










-121 












Zfx = -20 










l 
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From 2.0.2(a) s« = S/(* - 34)'/* - (x - 34)' 

= ttt - (0-1)* - 2-27 

and s = 1-S1 

We shall see later (2.15) that in the case of grouped distributions 
of a continuous variate (as here), a small correction to the 
variance is necessary to compensate for the fact that the fre- 
quencies are grouped. The variance of the present distribution, 
so corrected, is 2-20 correct to 2 d.p. 

2.7. Moments of a Frequency Distribution. The quantity 
Xs* is the second moment of the distribution about its mean. 
We shall find it useful to extend this idea and define the higher 
moments of such distributions. 

For the present we shall assume that we are dealing with 
finite sample distributions of a discrete variate. 

The rth moment of a distribution about its mean, or the r//i 
mean-moment, m r is defined by 

Nm r m 2 f t (x - gy . . . (2.7.1) 

lm 1 

where the variate x takes the k values xi, (i = 1,2. . .A), with 

I 

frequency /(, (i = 1, 2 . . . ft), 2 /( = N, the sample size or 

i— 1 

total frequency, and x is the sample mean. The rth moment 
about .r = 0, >»,', is, likewise, defined to be 

Nm/ m 2 f#4 .... (2.7.2) 
i- 1 

Expanding the right-hand side of (2.7.1) by the binomial 
theorem, we have 

m, = m,' — (,)>»,'«»,_,' + (yfr'HO'P'-t' - . . . 

+ (- l)'(s)(»V)'m,-.' +... + (- l)'-»( r 1 ,)(»»,')'->».,' 

+(-in*h.v ( 2 - 7 - 3 > 

where is the binomial coefficient. r(r — \)(r — 2) . . . 

(>• — s -f l)/s! or r\/s\(r — s)t. In particular, since we may 
always combine the last two terms, 

»«, = 0 ; m t = »»,'- (»!,')' ; "i 3 - «h' - 3m,'»«,' + 2(111,')' : 
and m t = mi/ - 4»»,'»i,' + Hm l ') t m t ' - (2.7.4) 
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2.8. Relative Frequency Distributions. When we have a 
large sample of observations, some at least of the class-fre- 
quencies will also be great. In this situation it is often con- 
venient to reduce the frequency distribution to a relative- 
frequency distribution. If the variate x takes the value x,; fi 

times and the total frequency 2 /< = N, the relative frequency 

i' — l 

of the value x, is /j/.Y. Clearly, the total area of the cells in a 
relative-frequency histogram is unitv. and it follows at once 

k 

that, if we write F, = f t /M and. so 2 F( = 1, 

tml 

x = 2 Fjxi 
i-l 

* / * \ 

and s» = 2 F((x, -*)»=( 2 W - x* 

Thus, when we are dealing with relative frequencies, the mean 
is simply the first moment of the distribution about x m 0 and 
the variance is simply the second moment about the mean. 

2.9. Relative Frequencies and Probabilities. Directly we 
begin to speak of " relative frequencies ' ' we are on the threshold 
of probability theory, the foundation of statistical analysis. 

Suppose that, having witnessed the measurement of the 
length of each of the two hundred metal bars we have been 
talking about, we are asked to predict the length of the 201st 
bar. If we take into account the information provided by the 
first 200 bars, we shall probably argue something like this : we 
notice that, of the 200 bars already measured, 02 were in the 
:i4-in. class, 44 were in the 35-in. class, 35 were in the 33-in. 
class. The next two hundred bars are not likely to reproduce 
this distribution exactly, but if the 200 bars already measured 
are anything like a representative sample of the total batch of 
bars, it is reasonable to assume that the distribution of length 
of the next 200 will not be radically different from that of the 
first 200. Of all the lengths the three occurring most frequently 
in the sample are 33, 34 and 35 in. If. then, we have to plump 
for any one length, we shall choose the 34-in. class, the class 
with the highest relative frequency. 

Suppose now that we are asked to estimate the probability 
that the 201st bar will have a length falling in the 34-in. class. 
In reply we should probably say that if a numerical measure of 
that probability has to be given, the best we can do is to give 
the fraction for this is the relative frequency of the 34-in. 
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class in the available sample of 200 ; and if the drawing of each 
sample of 200 bars is indeed random, we have no reason to 
expect that the relative frequency of this class in the next 
sample will be greatly different. 

Assume now that the next sample of 200 has been drawn and 
that the relative frequency of the 34-in. class in this sample is 
The relative frequency of this class in the combined 
distribution of 400 bars will be Hi or In assessing the 

probability that the 401st rod will fall in the 34-in. class, we 
should, presumably, use this latest figure. In so doing we are 
actually implying that, as sampling is continued, the relative 
frequency of an occurrence tends to some unique limit which is 
" the probability " of the occurrence we are trying to estimate. 

There can be little doubt that it was somewhat in this way 
that the concept of " the probability of an event E given con- 
ditions C " arose. But there are difficulties in the way of de- 
veloping a " relative-frequency " definition of that probability. 

In the first place, have we any grounds for assuming that in 
two different sampling sequences, made under exactly similar 
conditions, the relative frequency of the 34-in. class would lead 
to exactly the same limit? 

Secondly, we made the explicit assumption that the sampling 
was random, that each sample was really representative of the 
population of all the bars available. But, as we saw in the 
first chapter, the generally accepted definition of a random 
selection from a population is that it is one made from a popula- 
tion all the items of which have an equal probability of being 
selected. So our definition of the probability of a particular 
event /;. given a set of conditions C. itself depends on knowing 
what you mean by the probability of another event /?', given 
another set of conditions C I 

What is the chance that when you toss a penny it will " show 
heads " ? More precisely, what is the probability of a head in 
a single toss of a penny? In trying to answer this question, 
we might argue as follows : 

There arc only two possible outcomes (assuming that 
the penny does not land standing on its edge!) : one is 
that it will land showing a head, the other that it will 
show a tail. If the coin is perfectly symmetrical and 
there is no significant change in the method of tossing, 
a head or a tail is equally likely. There is then one 
chance in two that a head will show. The required 
measure of the probability of a head in a single throw of 
the penny is then 1 in 2 or \. 
B 
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But even tins " geometrical " line of argument is open to 
criticism. Once again that haunting " equally likely " has 
cropped up, 1 but, apart from the suspicion of circularity how 
do we know that the coin is " perfectly symmetrical ", is, in 
fact, unbiased? Surely the only way to test whether it is or 
not is to make a sequence of tosses to find out whether the 
relative frequency of a head ultimately tends to equality with 
the relative frequency of a tail. So we are back again at the 
relative frequency position! But, assuming that the coin is 
asymmetrical, is biased, how are we to estimate the probability 
of a head in a single throw without recourse to some relative 
frequency experiment? A battered coin can have a very 
complicated geometry! On the other hand, the "perfectly 
symmetrical " coin and the " completely unbiased " die do 
not exist except as conceptual models which we set up for the 
purposes of exploration and analysis. And the very process 
of setting up such models entails assigning precise probability- 
measures : in saying that a coin is " perfectly symmetrical " 
we are m fact assigning the value i to the probability of a head 
and to that of a tail in a single throw, while the very term 
" completely unbiased " six-faced die is only another way of 
saying that the probabilities of throwing a 1, 2, 3, 4, 5, or 6 
in a single throw arc each equal to 1/6. 

The matter is too complicated for full discussion here, but suf- 
ficient has been said for us to adopt the following definitions:— 
Definition 1 : If the single occurrence of a set of circum- 
stances, C, can give rise to m mutually exclusive events 
E, (I «■ i, 2, . , . in), and if a reasonably large number, n, 
of actual occurrences of C are observed to give rise to /. 
occurrences of E u f 2 occurrences of £„, and so on (where 
necessarily S,/ ( /n 1), then the probability of B„ p{E,\C), 
is in this situation measured, with a margin of error, by the 
relative frequency fjn (0 ^/ ( /„ < 1), it being assumed 
that, as n increases, each empirical frequencj tends to 
stabilise. Correspondingly, in any mathematical model 
we set up we assign certain probability numbers, p., to 
each E„ such that, for all /', 0 < p, < 1 and Z,p, = 1, i.e., 
we automatically postulate a probability distribution for 
the E,'s in setting up the model. 

1 Professor Aitkcn has said : 

" Every definition which is not pure abstraction must appeal 
somewhere to intuition or experience by using some such verbal 
counter as ' point ' straight line ' or ' equally likely ", under 
stigma of seeming to commit a circle in definition " [Statistical 
Mathematics, p. 11). 
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Definition 2 : An event E t is said to be dependent on an 
event E v if the occurrence of E, affects the probability of 
the occurrence of E 2 . If, however, the occurrence of E, 
does not affect the probability of E 2 , the latter is independent 
of Ei- If £ 2 is independent of E t and E, is independent of 
E.,, the events £, and E 2 are said to be independent. On 
the other hand, if the occurrence of E, precludes the 
occurrence of £ 2 and the occurrence of E 2 precludes the 
occurrence of £„ the two events are said to be mutually 
exclusive. 

(In a single throw of a coin, the event " Heads " and 
the event " Tails " are mutually exclusive. On the other 
hand, the occurrence of a head in the first throw does not 
influence the outcome of a second throw, and thus the 
two events are independent. In contrast, the second of 
two successive shots at shove-halfpenny is usually de- 
pendent on the first.) 
2.10. Elementary Probability Mathematics. It follows from 
the postulated correspondence between probabilities and 
relative frequencies that 

If the two events E, and E% are mutually exclusive and 
p(Ei\C) = pi and p(E 2 \C) = p lt then we assert that the 
probability of the event " either E, or E 2 ", which we write 
P(E, -\- E S \C), is 

p t +p t ■•• (2.10.1) 

For suppose in a large number of occurrences of C, n, say, the 
event l-\ is observed to occur /, times cind E z is observed to 
occur / 2 times, then the event "either £, or /I 2 " will have 
occurred f t + / 2 times, i.e., with a relative frequency 
(/, -f / 2 ) /«=/,/« +/j/w. 2.10.1 is the law of addition of 
probabilities for mutually exclusive events. 

It follows that the probability of the non-occurrence of any 
E, E, is given by 

p(E\C) = 1 - p(E\C) . . (2.10.2) 
£ is often called the event complementary to E. If p{E\C) 
isp and p{£\C) is q, then, in the case of complementary events 
p + q = 1 . . . (2.10.3) 

2.10.1 also gives immediately, if we have M mutually ex- 
clusive events E, with probabilities p, (« = 1, 2, . . . n), the 
probability of " either E t or E s or E 3 or ... or E t " as 

p[E, + E 2 + E 3 + . . . + £J 

= Pi+P»+P* + ■ ■ ■ +P* = f P '■ 
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Multiplication Law: Group some of the events E, (i = 1 
to n) together in a set S„ group some of the remaining events 
into a set S s and continue in this way until all the £,'s are 
exhausted. Each S, may be considered as a new event. Then 
the events S, are mutually exclusive and together exhaust the 
set of original E,'s. Let the probability of the event 
5* PdSi = pi.- Now, using the same method, group the E,'s 
into a different set of events T, and let p(T t ) = p.. In 
general, any S, and any T, will have some of the £,'s in 
common. Denote the E,'s common to S, and T, by S, . T.. 
We now have another set of mutually exclusive events, the 
S, . T ,'s, taken over all values of t and ;', also exhausting 
the E,'s. Let p[S, . T,) = p u . If we consider the events 
S, . T, for which i is fixed but 4 varies, it is clear that 
1,5,. T, =5,. Likewise. S,S, . T, = T,. Consequently, p, ^ Z.p.. 
and p, = £,/>„. Also £,(/>„//•,.) = 1 = £,(p u !p.,) and 

° </>„//>, • < I. 
(Every p„!p,. is essentially positive). Hence it would appear 
that the quantities pJp,. are also probability-numbers. 
To see that this is in fact the case consider the identity 

P„ = Pi.iPiJp,) . . . (2.10.4) 

The corresponding frequency identity is/,, = /,.(/„//,.). but 
f h lf,. is the relative frequency with which the event S, . T 
occurs in all the occurrences of the event S„ i.e., it is the 
relative frequency with which the event T, occurs on those 
occasions in which S, occurs. Consequently pJp,. is the con- 
ditional probability of T, given S„ i.e., 

P(S, . Tj) = P(S,) . &TAS,) . (2. 10.5) 
This is the probability multiplication law. 

Referring to Definition 2. if T, and S, arc independent 
piTJS,) = p(Tj) mp 4 and p(S,\T,) = p(S,) = p,.. 

Hence 

P» = P:P-, • . . (2.10.3a) 

Consequently 

// ft independent events E, have probabilities p(E,\C,) = 
P„ the probability that they all occur in a context situation C, 
hi which all the C,'s occur once only, is 

p(E t E„\C) =p,.p t . . . p t = 1*1 p(E,\C,) 

'- 1 

(2.IO.-)b) 
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Example : What is the probability of throwing exactly 9 with two true 
dice? 

To score 9 we must throw 0 and 3. 5 and 4. 4 and 5. or 3 and «. 
If we throw any one of these we cannot throw any of the rest. The 
events are, therefore, mutually exclusive. The number of possible 
outcomes is 6 . 6 = 30. Therefore the required probability is / 5 
or 

Example : A bag contains 5 white balls and 4 black balls. One ball 
is drawn at a time. What is the probability that the balls drawn 
will be alternately white and black ? 

The probability of drawing a white ball at the first draw is J, 
since there are 5 white balls in the 9 to be drawn from; the prob- 
ability of then drawing a black ball is 4. since there are now 8 balls 
to draw from, of which 4 are black: the probability that the third 
ball drawn will l>e white is, consequently. $ and so on. The required 
probability is, then, 

f t * f i-J-i i ! =t*s 

Example : If the events E l and li, are neither independent nor mutually 
exclusive, what is the probability that at least one of F. x and li 3 
occurs ? 

Let p(E,) = p,. p{E,) = p, and p(E, . E,) = p„. The required 
probability is the probability of one of three mutually exclusive 
events — cither both occur, or E, occurs and does not. or E, 
occurs and E, does not. 

Now the probability that F., occurs is the sum of the probabilities 
that both E, and E. occur and that occurs and £, does not. 
Consequently />(£, . E,) = />(£,) - p(E t . £,) = p, - />„. Like- 
wise. />(£|. E,) = p t — p n . Therefore, the required probability 
that at least one of the two events occurs, being the sum of the 
probabilities p(E t . £,). p{E, . E,) and p[l£, . £,). is given by 
/>(£, + £,) = (/>, - />„) + (p, - />„) + p lt = py + pr - p t . 

Problem : // from n unlike objects, r objects are selected, in how many 
ways can these r objects be ordered or arranged > 

Imagine r places set out in a row. We may fill the first place in 
any one of n ways. Having filled the first place, there are >i — 1 
objects left, and there are therefore n — I ways of filling the second 
place. Arguing on these lines, there will be (n — r + 1) ways of 
filling the last place. The total number of ways in which the r 
objects may be ordered is, then, n{n — l)(n — 2) ...(«— r + 1). 
We may write this 

"P r a «!/(»» - r)l 

Problem : In how many ways may r objects be picked from n objects 
regardless of order ? 

Let this number be x. Now x times the number of ways in which 
r objects can be ordered among themselves will be the number of 
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arrangements of r objects selected from h objects. The number of 
ways r objects may Ikj arranged among themselves is clearly r< (by 
the previous problem). Hence 

x = n!/r!(» -r)\ 
This number is usually denoted by the symbol "C r or by . 

Example : In how many different orders can a row of coins be placed 
using 1 shilling, 1 sixpence. I penny and 6 halfpennies ? 

If we treat the halfpennies as all different the number of arrange- 
ments is 9 ! But it is possible to arrange the halfpennies in (1 ! 
different ways, all of which we now consider to be equivalent. 
Consequently the required number of different orders is 0!/ti! = 504. 

Problem : In how many ways can n objects be divided into one group 
of h, objects, a second group of n a objects and so on, there being A 
groups in all ? 

We have «, + n, -f- n, 4- . . . + n t = n. Now the first 
group may be chosen in m !/»!,! (n — »!,) I ways, the second in 
{n — Mi) !/»»,! (>• — fi| — «,) I ways and so on. 

The total number of ways is, then, the product of all these terms : 

wl (n — w,) I (« — n, — n, — . . . — g» . ,) I 

«,!(n — m,)! " »i,!(n — n, — »»,)!' ' ' n t \ (»i —n, —n, — . . .— Mi) I 
1 

Since X n,- = n, the term on the right of the last denominator is, 

apparently. 0! Has this symbol a meaning ? We have (ji — I)! = 
fil/n. Putting « = 1. we find 0! 1, which must be taken as the 
definition of the symbol 0! Consequently, the required number of 
ways is 

„ ,„ ,"' — j or nil B M 

«,!«,! . . . n t >. u l 

Example : In how many ways can all the letters of the word calculus 
6c arranged ? 

If all the letters were unlike there would be 8! ways. But there 
are 2 C's, 2 L's and 2 U's, consequently the number of ways is 

81/21212! = 7!. 

since the indistinguishable C's may be arranged in 2! ways and so 
also the L's and U's. 

2.11. Continuous Distributions. Let us return to the bars 
of metal we were measuring and, imagining now the supply to 
be inexhaustible, continue measuring bar after bar. If, 
simultaneously, we greatly increase the accuracy with which 
we measure, we shall be able progressively to reduce the range 
of our class-intervals. Since length is a continuous variable. 
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if we go on measuring indefinitely, none of our class intervals, 
no matter how small their range, will be vacant. On the 
contrary, the frequency of each interval will increase in- 
definitely. If, then, A/< is the relative frequency of the variate 
in the interval xi ± ^Ar,-. centred at x = xi. the height of the 
relative-frequency histogram cell based on this interval, y, say, 
will be given by y< — A/,/Avj. And, in the limit, we shall have 
a continuous relative frequency curve, y = ^(.v), such that the 
relative frequency with which the variate .v lies within an 
interval x i \dx will be given by ydx = <f>(x)dx ; but this, on 
the relative-frequency method of estimating probability, is the 
probability, dp(x) that x will lie between x ± ^dx. Thus, in 
the limit, our simple, relative frequency diagram for grouped 
frequencies of a continuous variate is transformed into the 
population probability curve of that continuous variate. It 
follows that the probability that x lies within an interval 
a < x ^ b is given by 

P(a<x < 6) =j <f>(x)dx 

a 

and, defining </>(x) to be zero at any point outside the range of 

/■ + «= 

the variate x, J <{>(x)dx = 1, since * must lie somewhere 

within its range. 

In the case of a continuous variate, it is meaningless to speak 
of the probability that this variate, *, say. shall take a specified 
value xi. for instance. For the number of possible values, in 
any finite range of the variate. that can be taken is infinite ; 
and. therefore, theoretically, the probability that x — v, would 
appear to be zero. Yet, clearly, it is not impossible that 
x = xt. We therefore confine ourselves to speaking of the 
probability dp(x) that x lies in an interval x ± \dx. In this 
way we have 

dpi?) = t(x)dx, 

where d>(x) is called the probability density and defines the 
particular distribution of x ; it is measured by the ordinate at 
x of the probability curve y = $(.v). 

2.12. Moments of a Continuous Probability Distribution. 
Just as we described a sample-frequency distribution by means 
of moments of the distribution about some specified value of 
the variate, so too we describe continuous probability distribu- 
tions. Thus, the mean of the distribution is the first moment 
about x =■ 0. Following the convention that Greek letters 
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denote population parameters while the corresponding Roman 
letters denote the correspoiuling sample statistics, we write the first 
moment about x = 0. 

r + =» 

u,' / X#X)dx' . . . (2.12.1) 

-<© 

ami for the rth moment about x = 0, u.',, and the rth moment 
about the mean, u.,. 

u/ - I xHx)dx; and ^ - f ft _ n,')'^)^ 

■'-» (2.12.2) 
In particular the second moment about the mean is the popula- 
tion variance, and we write 

( *" /•+« 

"* m = I (•* - ivW-*)** = / **#.*)(£* - (n,')« 

■'-«. (2.12.3) 
or a* 3 U» = Uj' - (u,')* • • (2.12.3(a)) 

The probability curve y — <£(.>•) may be symmetrical about its 
central ordinate, or it may be " skew ". If it is uni-modal and 
the mode lies to the right of the mean, there will be a long tail 
on the negative side, and the curve is accordingly called 
negatively skew ; if. however, the mode lies to the left of the 
mean, the curve is called positively skew. It may happen that 
a mode, given in the case of a continuous curve by dy/dx = 0. 
d*y/dx* < 0 (see Abbott. Teach Yourself Calculus, p. 88). does 
not exist. The curve is then often l-shaped, positively J- 
shaped if dy/dx is everywhere negative, and negatively J- 
shaped if dy/dx is everywhere positive (the " tail " of the distri- 
bution being towards the positive or negative side respectively). 
A U-shaped curve occurs if d*y/dx* is everywhere positive and 
dy/dx m 0 at some interior point of the range (see Fig. 2.12). 

In order to compare the skewncss of two distributions, it is 
necessary to have some measure of skewness which will not 
depend upon the particular units used. 

One such measure (Karl Pearson's) is given by : 
(Mean - Mode) /(Standard Deviation). 

It is more in keeping with the use of the moments of a 

distribution to describe that distribution that we should use 

/ + » 
(x — \j.\) 3 ^>{x) dx, 
- CO 

and, if the curve y = <f>[x) is symmetrical about * = u.,'. u., =0. 

1 If the range of x is finite, from * = a to x = 6, for instance, we 
define 4>(x) to be zero for all values of x outside this range. 
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This is easily shown by transferring to the mean as origin, i.e., 
by making the transformation X = x — u,', then u., = 

/ X 1 <h(X)dX. If the curve is symmetrical about A' = 0, 

j,(-X) = <f,(X). but (-X) 3 4>(-X) = -X 3 4>(X), and. con- 

sequently. / X*<f>{X)dX = 0, in this case. If, now, the curve 

is positively skew, the cubes of the positive values of x together 




POSITIVE SKEW NEGATIVE SKEW 




POSITIVE J-SHAPED KURT0SIS 



Fig. 2.12.— Types of Distribution. 

are greater than those of the negative values and, therefore, n, 
is positive. On the other hand, if the curve is negatively skew, 
the cubes of the negative values are greater than those of the 
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positive values, and u 3 is negative. To ensure independence of 
units employed, it is necessary to divide by a 3 . 

The square of this quantity, denoted by p„ is the con- 
ventional measure of skewness. Thus p, = Pt'/tfi* 

We use the fourth mean-motnenl to measure the degree to 
which a given distribution is flattened at its centre (kurtosis). 
This measure, denoted by /3 8 . is given by u.,/u,*. In the case 
of the normal distribution, 0, = 3 ; if, then, this distribution 
is used as a standard, the quantity /?, — 3 measures what is 
called excess of kurtosis. 

The corresponding sample moments may, of course, be used 
to measure the skewness and kurtosis of frequency distributions. 

2.13. Expectation. In the previous chapter we roughly 
defined a variate as a variable possessing a frequency distribu- 
tion. We can now make that definition a little more precise. 

Definition: A VARIATE or, as it is commonly called, a 
random variable (or chance variable), x, is a variable such 
that, for any given number k, the probability that x is less 
or equal to k is at least theoretically, or in principle, 
calculable, i.e., a variate is defined by its associated proba- 
bility distribution. 

Definition i EXPECTATION OF A VARIATE : 
(1) When x is a discrete variate which may take the 
mutually exclusive values xdi = 1, 2, 3, ... n) and no 
others, with respective probabilities />(*;), the expectation of 
x, 6(x). is given by 

£(x) mpipcj.x, +p(x i ).x i ... +P(x l ).x i + ... +p(x„).x„ 

or 6(x) a I p(xi).x, (2.13.1) 

i - i 

or (2) When a: is a continuous variate the expectation of x, 
is defined to be 

£{x) m J xfi(x)dx . . . (2.13.2) 

where <f>(x) is the probability-density defining the 
distribution of x. 

This definition may be generalised to define the expectation 

OK A CONTINUOUS FUNCTION OF X, 0(.v), as follows : 

£(6(.r)) a / 0(.v) .f(x)dx . . (2.13.3) 



FREQUENCIES AND PROBABILITIES 



43 



providing the integral has a finite value (see Abbott, Teach 
Yourself Calculus, pp. 227-232). 

The concept of expectation arose from gambling. Suppose 
that your chance of winning a sum of money £x l is p,, that of 
winning £x t is p t and so on until, finally, your chance of winning 
£x„ is p n ; then, if these are the only amounts you have a chance 
of winning, your expectation is 

This is, in fact, the limit of the average sum won if you were 
to go on gambling indefinitely, l'or suppose that in N " goes " 
you win £x, on », occasions, £x t on n, occasions and so on, and. 
finally, £xt on nt occasions, the mean amount won (and, 
remember, some or all of the x's may be negative, i.e., losses!) is 
* * 

1 n,*f/.V = 2 (mlN)xo But when N tends to infinity, 
<-l <- l 

«</.V tends to pi, the probability of winning £x ( . Thus 
£(x) = limit F £ **/N*L 

Example : Show that the expectation of the number of failures pre- 
ceding the first success in an indefinite series of independent trials, 
with constant probability of success, is qjp, where q = I — p. 

The probability of 1 failure and then the success is qp ; 

- failures „ ,, qqp = q*p; 

„ h failures „ „ q*p. 

Therefore the required expectation is 

1 . qp + 2 . q'p + 3 . q*p + . . . + k.qtp + . . . 
= qp(\ + 2? + 3q> . . . + kq*- > + ...)= qpl(l - ?)' 
= 9PIP' = IIP- 

Example : A point P is taken at random in a line AB, of length 2a, 
all positions of the point being equally likely. Show that the 
expected value of the area of the rectangle AP . PB is 2a*/3 
(C. E. Weathcrbum, A First Course in Mathematical Statistics). 

Let pdx. where p is constant, be the probability that the point P 
is taken at a distance * ± $dx from A. Then, since P is somewhere 
in AB. we have 

f 

I pdx = 1 or p = 1 /2a 

The area of the rectangle AP . PB is x(2a — x). Therefore the 
expected value of the area is 

£{x[2a -*))=/ x(2a (4a J - 8a s /3)/2a = 2a'/3 

o 
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2.14. Probability Generating Functions. Suppose that * is a 
discrete variate taking the values x t , (t = 1, 2. . . . ft), with 
respective probabilities />,. (i =1,2,... ft). It follows from 
our definition of the expectation of a function of x that the 
expectation of the function I 1 of x is : 

= Pi* + + ... +Pit*> + ... + Ptt't (2.14.1) 

The coefficient of ft on the right-hand side is precisely the 
probability that x takes the value xi. 

Let us now assume that it is possible to sum the series on 
the right-hand side of this equation, obtaining a function of /, 
G(i), say. Now if we can keep this function by us, so to speak, 
whenever the situation supposed occurs, we have only to bring 
out G{1), expand it in a scries of powers of / and read off the 
coefficient of I", say, to find the probability that, in this 
situation, x takes the value xi. Such a function would, in 
fact, generate the probabilities pt with which v takes the 
values Xi. We, therefore, call it the Probability Generating 
Function for x in that situation. 

The corresponding expression when % is a continuous variate 
with probability density <£(*) is 

f* m 

G[l) = S(t') = / l*<f>{x)dx . . . (2.14.2) 

which is clearly a function of /. 

Definition : Whether x be a discrete or continuous 
variate, the function, G{t), defined by 

G(f) m 6(f) .... (2.14.3) 

is the Probability Generating Function for x (p.g.f. 
for x). 

The most important property of generating functions lies in 
the fact that : 

When x and y are independent, discrete variates, and, 
often, when they are continuous variates, the product of 
the generating function for x and that for y is the generat- 
ing function for the new variate {x + y). 

Let us take three coins, a shilling, a sixpence and a penny, so 
worn that they are unsymmetrical. Let the probability of 
throwing a head with the shilling in a single throw be p, and 
that of a tail, </,. Let the corresponding probabilities for the 
other coins be p„ q t and p 3 , q„ respectively. When a head is 
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thrown let the variate x take the value 1 and when a tail is 
thrown the value 0. The p.g.f. for the shilling is q,l° + p t t l ; 
that for the sixpence, qj" -f />-<'. and that for the penny 
1 j'° + Pa 1 '- Consider the product of these p.g.f. 's : 

(?i<° + Miti? + ^')(? s ' 0 + PJ X ) : 
this may be written 

(?i + PiW.1t + PJ)(l» + Pt 1 ) = ?i?Wa + (Pililt + ItPOl 

+ liltP*)' + IPtPA + PiltPi + <liPiP*) t% + *>./W- 

We recognise immediately that the coefficient of / 3 (x = 3 
indicates 3 heads when the coins are thrown together) is the 
probability of exactly 3 heads, that of <* is the probability of 2 
heads and 1 tail (i« = <«(')+ '(»)), that of /, the probability of 1 
head and 2 tails (/ = <'<»+ 1°>) and that of t° [t = /°(') ► *°>). 
the probability of all 3 coins showing tails. 

If now we select any one of the coins, the shilling, say. and 
toss it M times, the p.g.f. for x, the number of heads shown in 
the « throws, is (17, + pit)"- If. however, we score in such a 
way that each time a head is thrown we gain 5 points and each 
time a tail turns up we lose 3, the generating function is 
farf* + £i' 4 ) n - f° r ' m a single throw, if our variate .v is now the 
points scored, the probability of scoring 5 is p l and that of 
scoring —3 is p v 

In the last case, let the coin be symmetrical, so that 
p = $ = q. Suppose it is tossed thrice. The p.g.f. for the 
score will be 

(i<- 3 + = (})»/ "(1 + /») a = ir-»(l + 3<" + 3/'« + <•') 
= + + i' 7 + i' 11 
This shows, as the reader should confirm by other methods, 
that the only possible scores are — 9. —1, 7 and 15. with 
respective probabilities J,, j. f, fa. 

It is reasonable to say, therefore, that : 

If G,(f) is the p.g.f. for x in the situation St, and the 
situations S, are all independent, then the p.g.f. for x in the 
compound situation S,S 2 S 3 . . . is G{t) = G,(t) .G s (f) .G 3 (/) 

2.15. Corrections for Groupings. When we group all the 
values of a continuous variate % lying between xi ~ \h into a 
single class and treat them as all being exactly xi, we distort 
the true distribution of x. Consequently, if we calculate the 
moments of the distribution from the distorted, or grouped, 
distribution, the values we obtain will in general be inaccurate, 
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and so corrections must be applied to counteract the distortion 
due to grouping. These corrections are known as Sheppard's 
corrections, and may be applied only under certain conditions. 
Even then they do not counteract the distortion completely, 
nor do they, in any particular case, necessarily improve matters, 
although they tend to do so on the average. If, then, the 
terminal frequencies for the range arc small (i.e.. the distribu- 
tion is not, for example. J-shaped) the calculated first moment 
need not be corrected, but the calculated variance should be 
reduced by an amount equal to /i f /12, where h is the length of each 
class-interval. If, however, h is less than one-third of the cal- 
culated standard deviation, this adjustment makes a difference 
of less than J % in the estimate of the standard deviation. If 
the variate is essentially discrete. Sheppard's correction should 
not be applied (see Aitken, Statistical Mathematics, pp. 44-47) 

EXERCISES ON CHAPTER TWO 

I. Measurements are made to the nearest inch of the heights of 
100 children. Draw the frequency diagram of the following 
distribution : 



Height 


60 


61 


62 


63 


64 


65 


(16 


67 


68 


Frequency . 


2 


0 


16 


20 


25 


12 


10 


4 


3 



Calculate the mean and the standard deviation. (L.U.) 

2. Draw a cumulative frequency polygon for the data of Question 
I. and from it estimate both the median value and the standard 
deviation. 

3. Construct a cumulative frequency diagram from : 



Average Earnings of Women (18 and Over) in 56 Principal 
Manufacturing Trades, Great Britain, April 1947 



s. d. per week. 


66/10 


73/5 


69/10 


52/7 


65/2 


62/3 


61/6 


65/1 


C.0/7 


73/1 


71/- 


59/4 


03/11 


67/6 


58/4 


67/- 


03/- 


79/9 


(18/5 


56/- 


68/9 


56/9 


(14/7 


69/1 


68/6 


71/8 


71/- 


05/7 


63/6 


64/2 


05/6 


64/2 


78/7 


07/8 


65/1 


04/11 


74/1 


03/5 


71/- 


74/9 


64/3 


71/- 


71/11 


62/5 


64/2 


62/- 


64/7 


74/8 


72/4 


70/7 


69/7 


60/1 


01/11 


66/9 


04/9 


66/5 



(Source : Ministry of Labour Gazette, October 1947.) 
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Estimate the median and quartile earnings and give a measure of 
dispersion of the distribution. (L.U.) 

4. Eind the mean of each of the distributions : 



% Distribution of Wives with 
Wife's age Husbands aged : 



(at last birthday) 


25-29 


45-49 


15-19 


0-8 




20-24 


271 


o-i 


25-29 


57-8 


0-7 


30-34 


12-5 


30 


35-39 


1-5 


9-7 


40-44 


0-2 


29-9 


45-49 


0-1 


44-3 


50-54 




10-3 


55- 




20 


TOTAL 


1000 


1000 



(L.U.) 

5. In the manufacture of a certain scientific instrument great 
importance is attached to the life of a particular critical component. 
This component is obtained in bulk from two sources, A and B. and 
in the course of inspection the lives of 1,000 of the components from 
each source are determined. The following frequency tables are 
obtained : 



Source A. 


Source 13. 


Life (hours). 


No. of 
components. 


Life (hours). 


No. of 
components. 


1.000-1,020 
1,020-1,040 
1,040-1,060 
1.060-1,080 
1.080-1,100 
1.100-1,120 


40 

96 
364 
372 
85 
43 


1.030-1,040 
1.040-1,050 
1.050-1,000 
1,060-1.070 
1,070-1,080 
1,080-1.090 


339 
130 
25 
20 
130 
350 



Examine the effectiveness of the measures of dispersion with which 
you are familiar for comparing the dispersions of the two dis- 
tributions. (R.S.S.) 



6. In a room containing 7 chairs, 5 men are sitting each on a 
chair. What is the probability that 2 particular chairs are not 
occupied ? (L.U.) 
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7. In how many different ways can 3 letters out of 25 different 
letters be arranged if any letter may be usod once, twice or three 
times ? If two and not more than two different letters are used ? 

(L.U.) 

8. Using the symbols given in the table below, derive expressions 
for the probabilities that, of four men aged exactly 75, 80. 85 and 
90 respectively (i) all will attain age 95; (ii) all will die before 
attaining age 95; (iii) at least one will survive 10 years; (iv) none 
will die between ages 90 and 95. 

Kxaclage . . . 75 80 85 90 95 

Probability of surviving 5 years p a p, p t p, p, (LA.) 

9. From a bag containing 0 red balls, 0 white balls and 0 blue 
balls, 12 balls are simultaneously drawn at random. Calculate the 
probability that the number of white balls drawn will exceed the 
number of red balls by at least two. (1-A.) 

10. A property is known to be independent of a property /;',, 
of a property (£, + AT,) and of a property (£,/:,). Show that it 
is also independent of the property AT,. (L.U.) 

11. A probability curve, y = Mg), has a range from 0 to w . II 
</>(x) & e~', sketch the curve and lind the mean and variance. Find 
also the third moment about the mean. 

12. Prove that if x and y are discrete variates, £(x + y) 
= 6(x) + 6{y). and that, if x and y are independent. E{xy) 
= t'(x) . £(y). 

Solutions 

1. G3-89"; 1-6". 2. Median value. 03-2". 

3. Median: 05/0; semi-interquartile range ^ 3/2. 

4. 20-89; 45- 14. 

5. Range : (A) 120; (B) 60. Interquartile Range : (A) 27; 
(B) 40. Standard deviation : (A) 21 ; (B) 22. 

6. 1/21. 7. 25>; 25 x 24 x 3. 

8. (i) P,p,'p,'p,': (ii) [i -PoP,PtP,]V -P,PJ>>111 -P,p.][l -PA: 
(»>) P,P, + p,p, + pj> + ftfe; [i - p,P,'p,'(i - PJ'}. 

9. A- 

10. Full solution: I.et /:, occur ■, times out of «; E, n, times; 
AT, n, times; F,E, u„ times; AT,, «., times; AT,AT, times; 
E t E,E, times; and none of /•',, E, or AT, «„ times. Then the 
conditionsof independence arc: p(E l ) = p(E l ; E t ) p{E l ; AT, + AT,) 
= />(A-,; AT.AT,). i.e.. 

?j _ "n + »m m "it + "a + "m 

■ "i + "it + + »it» "i + "t + "it + "»» + »i» + "it. 

= ?io 
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Now if 0/6 = c/d = ejf — A. say. A = (e — a + «)/(</— 6 + /). 
Accordingly 

p(E), = „> = 1U+ ""1" , \„ = r&>; 

"1 + "is + "ts T "in 

11. Mean, I ; Variance. I ; /i 3 = 2. 

12. Full solution : I-et x take the values x ( with probabilities 
p { , (i ■l l 2,,,,|{ and let y take the values y. with probabilities 
p (>' = 1, 2, . . . m). Also let ir,, be the probability that x + y 
takes the values *, + y,. Clearly x + y may take nm values. We 
have then 

£{* + y) = £2*.,(*< + y>) = E^ir^, + £2ir (J y 
1 / > i i 1 

But Sirj, = m + + . • . + ",„, 

the sum of the probabilities that * takes the value x, when jr takes 
any one of the possible values y,. and this is p,. Likewise 

i 

Hence 6 (x + y) = Sfc*. + 2 ft» = £ (*> + £ W- 

i I 

If now x and y are independent, the probability that xy takes the 
value x,yi is p,P,. Therefore 

£(xy) - ZZpf/tft = B(ftti . (/>,)• 

Summing first over 

%) = 2[AV.£ty)] = £{y) zp,x, = £(y) . E{x) = £(x) . E(y). 
i i 

The reader should now prove these two theorems for the case 
when both * and y are continuous variates. 
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STATISTICAL MODELS 
I: THE BINOMIAL DISTRIBUTION 

3.1. Tossing a Penny. Let C(E, E) denote an event the 
outcome of which is either the occurrence of a certain event E 
or its non-occurrence, denoted by E. Now suppose that the 
probability of E happening, given C. is p. and that of E is q. 
We ask what is the probability that in n occurrences of C there 
will be exactly x occurrences of ? 

Once again we toss a penny, and assume that the constant 
probability of obtaining a head in each throw is equal to that 
of obtaining a tail. Hence, p = q = f The outcome of a 
single toss is either a head (H) or a tail (T). We toss again, and 
the outcome of this second toss is again either H or T, and is 
independent of the outcome of the first toss. Consequently the 
outcome of two tosses will be 

either I1H or HT or TH or TT 
The outcome of 3 tosses may be written down as 
either HHH or HHT or HTH or THH 

or HTT or THT or TTH or TIT 
If now we disregard the order in which H and T occur, we may 
rewrite the outcome of 2 tosses as 

1 HH or 21 IT or 1 TT 
and that of 3 tosses as 

1 HHH or 3HHT or 3 HTI" or 1 TIT 
Writing HHH as H» HHT as H'T 1 , etc.. we have 
outcome of 2 tosses : either 1H 3 or 2H 1 T» or IT' 

.. 3 „ either 1H» or 3H«T« or 311 "T» or lT a 
By analogy, in 4 tosses we shall have : 

either 1H* or 4HT 1 or 0H ! T= or 4H'T 3 or IT* 
In 4 tosses then there are 1+4 + 0+ 4 + 1 = 16 = 2* 
possible outcomes, and the respective frequencies of 4 heads 
3 heads, 2 heads, and 0 heads will be 1, 4, 0, 4, 1 and the corre- 
sponding relative frequencies, T ' s , fa -," s , ,t, and 

We may arrive at these figures by a different route. We 
have p = q = Since each toss is independent of every 

5° 
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other toss, the probability of four heads, H*. in four tosses is 
= (i) 4 = tV- Next, the probability of throwing 
2* heads and 2 tails, say, in some specified order (HHTT. for 
instance) is — ±)(1 — I) = A ; but we may arrange 

2H and 2T in a group of 4 in IS different ways (HHTT HTTH 
TTHH THHT THTH HTHT) ; therefore the probability of 
obtaining H*T*. irrespective of order, is 6 x rs = j, the relative 
frequency we obtained before. 

Suppose now we require to know the probability of exactly 
47 heads and 53 tails in 100 tosses. Is there some compara- 
tively simple way of obtaining this probability without, for 
instance, setting out all the possible arrangements of 47 heads 
and 53 tails in 100 tosses? 

3.2. Generating Function of Binomial Probabilities. Look- 
ing once more at the possible outcomes of 4 tosses, we recall 
that there are sixteen possible results — either H* or 4 different 
arrangements of H'T 1 or 0 different arrangements of HT* or 4 
different arrangements of H'T 3 or T'. And we notice that 

H*. 4H*T». 6HT 1 , 4H'T S , T* 
are the successive terms in the expansion of 

(H + T)(H + T)(H + T)(H + T), i.e.. of (H + T)« 
Now. in the general case, where p is the constant probability 
of the event /; and q is that of E, and p + q = 1, the prob- 
ability of E occurring exactly x times in n occurrences of 
C(E, E) in some specified order is p*q a -*. And since the 
number of different order-arrangements of x E's and (ft — x) E's 

is wI/*I(m — *) I or the probability of exactly x E's is 
Now the expansion of (q + pt)" is, by the Binomial Theorem, 

(q + pt)» = qn + (")?«- + *fW + . . . 

+ (tyq'-'ptP + ■ ■ ■ + P°t" ■ (3.2.2) 

1 lence, denoting the probability of exactly x E's in n occurrences 
of the context-event C(E, E) by p„(x), 

(q + pt)» = p„(0) + p„(\)t + p„(2)t* + . . . 

+ p n (x)t' + . . . + p,,(n)f . (3.2.3) 

If now we put t = I, we have 

1 = pn{0) + p„(\) + p n (2) + ...+ p«(x) + ■■■+ p,(n) 
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and this is only to be expected, for the right-hand side is the 
probability of either 0 or 1 or 2 or ... or 11 /i's in n occurrences 
of C, which is clearly 1. 
We may say, then. 

If the probability, p, of an event E is constant for all 
occurrences of its context event C, the outcome of which 
is either E or E, then the probability of exactly x E's in n 
occurrences of C is given by the coefficient of t 1 in the 
expansion of (q + pt)". 

Thus {q 4- pt) n is the Probability Generating Function, p.g.f.. 
for this particular distribution, which because of its method 
of generation, is called the binomial distribution. 

3.3. Binomial Recursion Formula. Replacing x by x -f- 1 
in (3.2.1), we have 

PA* + l) = ( x % l )q"-*- , P' +l 

Hence 

Now fi n (0) = J", which for a given q and » is quickly calculated. 
Then p n [\) = 2 . t . p„(o) ; /.„(2) = ~± . t . p„(\), etc. 

3.4. Some Properties of the Binomial Distribution. Let us 
calculate the binomial probabilities for p = -fa, « = 5. 

By (3.3.1) 



while 
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... 5 — * 1 ., 
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of course (why ?). Fig. 3.4 shows histograms of the Binomial 
distribution for different values of p and n. It will be seen 
(Fig. 3.4 (c)) that, whenever p = q = ^, the histogram is 
symmetrical, whatever the value of n. Otherwise the distribu- 
tion is skew, although for a given value of p the skewness 
decreases as n increases. The distribution is unimodal, unless 
pn is small. 



o 
6 



53 
P 
o 



to 
H 

O 



n 
6 



•5 
o 
6 



— 

o 
6 



0 1 2 3 4 5 6 

Fig. 3.4 (c).— Binomial Distribution (p = I = q, n = 6). 



From (3.3.1) we see that p n (x + 1) will be greater than 
p„{x) so long as .| >1, i.e., putting 9 = 1 - p, so long 

as y^-j > A Wc see - t»en, that £„(.«•) increases with x 

until .r > £(,» + I) - 1. Taking p = 4. n = 10. *,,(.*) 
increases until * > 17/4 - 1 - 3-25. But since .v can only 
take integral values, this means that p lt {x) is a maximum when 
* = 4. 

3.5. Moment Generating Functions. Can we find functions 
which generate the moments of a distribution in a manner 
similar to that in which the probability generating function 
generates the probabilities of a variate in a certain set of 
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circumstances? Let us replace the / in the p.g.f., G((), of a 
distribution by e' and call the function of / so formed M(t). 
Then 

M(/) = G(e-) =<?(<!*•) = Z p(x t )e'i; («' = I, 2, 3, . . . w) 

- £ (p(xi)U + *it + + ■ • • + x,'t'/r\ + ...)) 

i— i 

= £ i p(x t ) + £ £ ( /»(*,)*/] i + [ ( £ ( 'V2 1 

+ . . . + rf pfa)xfjrfr\ + . . . 

= 1 +n,'</l! +ml 1 '<*/2! + • • ■ 

+ uV/H + . . . . (3.5.1) 

The rth moment about x = 0, u/, is consequently the co- 
efficient of l'/r\ in the expansion of M{1), and M(t) is the 
Moment-generating Function required. We see then that, pro- 
viding the sum 12 p(x,)e'f exists — which it will always do 

when >i is finite— the Moment-generating Function, M(t), for a 
given distribution is obtained by replacing / in the probability- 
generating function, G(t). of that distribution, by e'. 

Assuming now that we may differentiate both sides of (3.5.1), 

and putting / = 0, 
and, in general, 

tf-P5PL, =:V/ " ,(0) • ■ (3 - 5 - 2) 

For a continuous variate with probability-density <f>(x). 

M(t) = £{e") = f . • (3.5.2a) 

J + x 

Compare with 2.14.2. 
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In the particular case of the Binomial Distribution, G(t) s 
(q + />/)". Hence = (q + pe*)". Consequently, the mean 
H,', is given by 

IH' = M'(0) = [^{(9 + P*) n f\ i _ o 

•m '■;/>, '{./ + pel)" -«],_„, which, since (q + p) = 1, gives 

Hi' = n P (3.5.3) 

Likewise, 

9* = M"(0) = [^(? + W] ( _ u = [|[«M? 

= [*M? +pe , ) H - 1 + »(" - \)P z e*(q + pc 1 )"-*),.,, 
= «/> + w(n - !)/>» = pi,' + >»{n - 1)£« 

Thereiore, the second moment about the mean, the variance of 
the distribution, u^, is given by 

\H = m' - (Hi')* = »/> + «(« - 1)£* - n s />> - 

m/>(1 -p)=iipq . (3.5.4) 

The mean of the Binomial distribution is np and the 
variance is npq. 

Exercise : Show that ^ = np[(n — — 2)p J + 3(n — l)/> + 1]. 

Generally, however, wc are more interested in the mean- 
moments (moments about the mean of a distribution) than in 
those about x = 0. If now we assume that u.,' = M'(0) = »», 
say, and transfer the origin to .r = m, then, measuring the 
variate from this new origin and calling the new variate so 
formed, A', we have x = X + m. Consequently, 

f(e~) = <?(«*" "J = e^Sie"), 

or, if M m {t) denotes the mean-moment generating /unction. 

M(t) = e""M m {t), and M„,{t) = e-^M(t) . (3.5.5) 

It follows that the generating function for moments about any 
line x = a is obtained by multiplying V/(/) by e-<", while the 
nh mean-moment of the distribution, u, r , is given by 

Mr = \jp _ i( ^ M m W(0) . . (3.5.0) 



STATISTICAL MORELS. I 



57 



Exercise : Show by direct differentiation thai for tlie binomial dis- 
tribution : 

(') Ml = M m '(0) m (9 + /"')">],_„= 0 

(ii) p, - M m "(0) m [£.{r*tf ' ^"»],. 0 = "*» 

3.6. Fitting a Binomial Distribution. So far we have con- 
sidered Binomial distributions for which the probability of E 
in » occurrences of C has been known in advance. In practice 
this rarely happens. Consider the following example : 



Worked Example : The distribution of headless matches per box of 
50 in a total of 100 boxes is given in the following table : 



No. of headless 
matches per box. 


0 1 2 3 4 5 6 7 


Total 


No. of boxes 


12 27 29 19 8 4 1 0 


100 1 

1 



Let us assume that the distribution of headless matches per box 
over 100 boxes is binomial. We have »i = f>0, but we have no 
a priori value for p. We remember, however, that the mean is 
np. If. then, we find the mean of the observed distribution, we 
can estimate p. Thus 

* -„„ 0.12 + L27 4- 2.29 + 3.19 + 4.8 + 5.4 + (i.l 
np - 50p m 

which gives p = 0-04 and the mean = 2. With this value of p. the 
binomial distribution of frequencies of headless matches per box 
over 100 boxes is given by 

100(0-96 + 0-04/). M 

Using the method of 3.4. the reader should verify that the estimated 
frequencies are those given in the following table : 



No. of headless 
matches per box 


0 


1 


2 


3 


4 


S 


6 


7 


Observed No. of 
boxes 


12 


27 


29 


19 


8 


4 


1 


0 


Estimated No. of 
boxes 


1297 


20-99 


27-50 


18-30 


8-95 


3-43 


107 


0-28 


Estimated No. of 
boxes (nearest 
integer) 


13 


27 


28 


IS 


9 


3 


1 


0 
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The " fit " is seen to be quite good. If now we compare the 
variance of the observed distribution with that of the theoretical, 
estimated, distribution, we find that, the mean being 2. 

(a) variance of observed distribution 

= 12 ■ (- 2)' + 27 ■ (- 1)' + 191' + 8-2' + 4 3' -f l-4» 
= 1-708 

(b) variance of theoretical, estimated distribution 
= npq = 50 x 0 04 x 0 00 - 1-92 

3.7. " So Many or More ". Usually we are more interested 
in the probability of at least so many occurrences of an event, 
in n occasions of C{E. E) rather than in the probability of 
exactly so many occurrences of E. Thus our match-manu- 
facturer would, probably, not worry unduly about how many 
boxes in a batch of 100 contained exactly 4 headless match- 
sticks, but he might well be concerned to estimate the probable 
number of boxes containing 4 or more headless sticks. 

I^t P„{x ^ A) denote the probability of ft or more occurrences 
of E in M C{E, E), and let P„(x < It) denote the probability of 
less than k such occurrences. Then the probability of four or 
more headless matchsticks per box of 50 is denoted by P u [x ^ 
4). Now 

to 

P*{* > 4) = p l0 (4) + p i0 (5) + • • • + AsofSO) ■ J: p„(x) 
Since, however, 

£ p so {x) - 1. P so (* >4) = 1 - 2 p l0 (x) - 1 - P so (x < 4) 

x-0 x — 0 

Generally, 

P„(x > A) - 1 - P n (x < k) . . (3.7.1) 

When k is small and n not too large, this formula is useful, for 
the calculation of P„(x < k) is then not too tedious and the 
successive values />„(0), p„(\), . . . p„(k — 1) may be evaluated 
directly using the formula (3.3.1). When h is large and k is 
large, however, the evaluation of a single binomial probability 
is tiresome enough, let alone that of the sum of several such 
probabilities. We may overcome this difficulty by using the 
facts that : 

when n is large and p is small, the binomial distribution 
approximates to the Poisson distribution (see next 
chapter) ; while 
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when » is large but p is not small, it approximates to the 
normal distribution (sec Chapter Five). 

In such situations we use the properties of these distributions 
to approximate P„(x > k). Alternatively, for k < 50, n < 99 
we may use what is known as the Incomplete Beta Function 
Ratio.' usually denoted by I P [k, n — k + 1), which gives the 
value of P a (x ^ It), for probability, p, of E, 

Pn(x Ss k) = I„(k, n - k + 1) . . (3.7.2) 

Tables of the Incomplete B-Function Ratio, edited by Karl 
Pearson, are published by the Biometrika Office, University 
College, London. 



Mathematical Note to Chapter Three 
A. The Gamma Function. If n is positive the infinite 

integral 'J x"- 1 exp (- x)dx has a finite value. It is clearly 
o 

a function of M and is called the Gamma Function ; we write 
r(») b f x»- l exp(- x)dx . . (3.A.1) 

0 

We have immediately 

r(l) b j exp (-*)</* = 1 . . (3.A.2) 

0 

If in (3.A.1) we put* = X*, we have an alternative definition 
of T(n). for. writing dx = 2XdX, we have 



T(n) = 2 { " X 1 - - 1 exp ( - X')dX . (3.A.3) 

■* a 



Returning to (3.A.1) and integrating by parts, 3 we have, if 
n >f, 

P(n) = [-*"-'exp(- *)] + (>» - 1)J *"-'exp(- x)dx 

= (« - \)T(n - 1) (3.A.4) 
1 See Note at end of this chapter. 

' See P. Abbott, Teach Yourself Calculus, pp. 227 el seq. 
* Abbott, op. cit., pp. 188 et seq. 
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Applying this formula to the case where n is a positive integer, 
we have 

!'(»«) = (n - - 2) . . . 2.1 T(l) = <w - 1)! . . . (3.A.5) 
B. The Beta Function. Next consider the integral 

i 

*»"- '(1 — x) n - x dx. If m and n are positive, this integral is 



/ 



finite and is a function of in and n. We call this function the 
Beta Function and write 



B(m, n) mj 



- x)«-'dx . (3.B.1) 

Clearly. B(l, 1) = 1 (3.B.2) 

Now put ( = 1 — x, then dt = —dx 

and 

B(m, n) = — / (1 - «)™-'i»->«fc = / i" '(l - «)'«-'<fa« 

'l ■'o = B(«, m) (3.B.3) 

Thus we sec that the Beta Function is symmetrical in m and M. 
Now make another substitution, x =sin*<f},dx = 2 sin (f, cos <fid<f>. 
When x = 0, <j> = 0 and when * = 1, c/> = s/2. Thus 

B(wi, «) = 2 / sin*" - ' 4> cos«"- ' 040 . (3.B.4) 

0 

a useful alternative form. It follows at once that 

/s 



B(i.l)=2j <ty=7t . . (3.B.5) 



o 

C. Relation between Gamma and Beta Functions. It can be 

shown that B{m, n) and l'(m) and T(h) are related by the 
formula 

which immediately displays the symmetry of the B-function 
in M and »». It follows that, since B( j, {) = n and I"(l) = 1 

B(i. i) = (r(i))*or = Vi . (3.C.2) 
1 Abbott, op. cit., p. 225. 
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But, using (3.A.3), we have 

= 2 1 exp (- x*)dx. 

Consequently 

.to 

exp (-**)</* =\Vk . . . (3.C.3) 



a result we shall need later. 

D. The Incomplete B-function Ratio. We have seen that if 
the function '(1 — *)"-' is integrated over the range (0, 1) 
the result is a function of m and n. Suppose now that we 
integrate over only a portion of the range, from 0 to /, say, 



then B t (»». n) = J x m - 1 (1 — x)"- > dx is a function of /, m and 
o 

n and is called the Incomplete B-Funclion. Dividing this in- « 
complete B-function by the complete B-function gives us 
another function of /. m and »», called the Incomplete B- 
F unction Ratio, to which wc referred in 3.7. We denote it 
by li(m, w) and write 

I,{m, n) e B,(m, n)IB(m. n) = "^"j j *"" ' 0 - x)->dx 
If, moreover, m and n are integers, 

^"•"^^ -Vtn"-!). //'"'' 1 -^-'^ ,3D " 
Now put t = p, m = k, and n = n — k + 1. Then 

« - A + 1) - _ (*»-*(! - *j*-*4* 

(A - l)!(»i - BJJ/ a (3 D 2) 

Integrating by parts and putting q = 1 — p, 

/VMi-^=^ + 'iL^V- + ... 

(»-*)! 



+ 



A(* + 1) . . . (n - l)n' 
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Consequently, 

/p(A.«-A + l)=(")/>y *4- ( k J 1) + ... + *■ 

= pn(k) + p„(k + 1) + . . . + p„(n) 

= P„(x > A) (3.7.2) 

The Tables of the Incomplete B-function Ratio referred to in 
3.7 give the values of l,(w. ft) for 0 ^ n < 50 and n ^ m ^ 50. 
The values of t are given in steps of 0*01. Thus if n > w we 
cannot use the tables directly to evaluate /#(»», w). However, 
we can make use of the simple relation 

I,{m, ft) = t — /,.,{«, m) . . (3.D.3) 

which is easily proved as follows. Writing x = \ — X in 
(3.D.1). we have 

''<"•■ ■> - - (.'- Vi.'-ih C" - *>■-*-.« 

- .j-tiV-oi /'r'""^""" 

(m + n — 1)! r 

= (>» — 1 ! (n — I) ! Lj - X ^ dX 

o ] 

—j X" l {l - X)">-»dxj 
o 

= 1 - "') 



EXERCISES OX CHAPTER III 

1. Calculate, correct to the four decimal places, the binomial 
probabilities for p = \, n = 8. Calculate the mean and variance. 

2. If on the average rain falls on twelve days in ever)- thirty, find 
the probability (i) that the first three days of a given week will be 
fine, and the remainder wet; (ii) that rain will fall on just three 
days of a given week. (L.U.) 

3. In a book of values of a certain function there is one error on 
the average in m entries. Prove that when r values are turned up 
at random (with the possibility that any value may be selected 
more than once), the chance of all being accurate is (m — l)/r 
times as great as that of having only one error included in the 
selection. Find rjm in terms of M in order that there may be a 
nine to one chance that the selection is free from any errors. In 
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this case prove that as a very large set of tabulated values approaches 
perfection in accuracy, r increases to a limiting value of nearly 
10-5% of the size of M. (L.U.) 

4. Show that a measure of the skewness of the binomial dis- 
tribution is given by {q — p)l{npq)* and its kurtosis is 

3 + (1 - 6pq)lnpq. 

5. Calculate the value of p if the ratio of the probability of an 
event happening exactly r times in n trials to the probability of the 
event happening exactly n — r times in M trials is independent of 
n . (0 < p < 1). (I.A.) 

6. Table 7.1, page 136, gives 500 random digits grouped in 100 
groups of 5 digits. Let the digits 0, 1, 2, 3 be each taken as indicating 
a success, S, in a certain trial and the digits 4, 5, G, 7, 8, 9 a failure. 
Working along the rows of the table, count the number of 5 in each 
5-digit group. Form a frequency table giving the number of groups 
with 0, 1, 2, etc., S's. The theoretical distribution will be given by 

/ 6 4/\ * 

1 00 ^ |q *t" 10/ ' Calculate the theoretical frequencies and compare 

these with those actually obtained. Repeat using the columns of 
Table 7.1, i.e., the first live random digits will be 280SG. Repeat 
taking 0, 1, 2 as indicating an S and 3, 4, 5, 0, 7, 8, 9 as indicating 
failure. 

Solutions 

1. 0,0000; 1,00004; 2,00038; 3,00231; 4,00865; 5,0-2076; 
6. 0 3115; 7. 0-2675; 8. 01001. 

2. (i) 0-0083; (ii) 0-2803. 

3. rjm log 0-9/m log [(wi — l)/m]. 
6. p = 1. 



CHAPTER FOUR 



STATISTICAL MODELS. II: THE POISSON 
DISTRIBUTION: STATISTICAL RARITY 

4.1. On Printer's Errors. I am correcting the page-proofs of 
a book. After having corrected some 50 pages, 1 find that, on 
the average, there arc 2 errors per 5 pages. How do I set 
about estimating the percentage of pages in the whole book 
with 0, 1 , 2, 3 . . . errors ? 

To use the Binomial distribution, we need to know not 
merely the number of times an event E, whose probability we 
wish to estimate, has occurred, but also the number of times 
it could have occurred but did not, i.e., we want to know w, the 
total number of occasions upon which the event both did and 
did not occur. But in our present problem, it is clearly 
ridiculous to ask how many times an error could have been 
made on one page but was not. 

Here is a similar problem : 

A small mass of a radioactive substance is so placed that 
each emission of a particle causes a flash on a specially prepared 
screen. The number of flashes in a given time-interval is 
recorded, and the mean number of flashes over a specified 
Dumber of such intervals is found. On the assumption that 
the disintegration of any particular atom is purely fortuitous, 
what is the chance of observing some specified number of 
flashes in one time-interval ? 

Both these problems arise from situations in which the 
number, of occasions upon which an event, E, could or 
could not have occurred in a fixed interval of time or space is, 
to all intents and purposes, infinite, although over the A' 
intervals sampled, E in fact is found to have occurred only a 
finite number of times, Nm, say. 

We can use the Binomial distribution as a model only when 
we can assign values to p and to n. But in such cases as we 
arc now discussing this is not possible : n = Nni is indefinitely 
large because, although N is finite, n< is indefinitely large. 
Moreover, although we know the number of times, A'w, /: has 
occurred in the .V equal intervals, the ratio Nm/Xm, which we 
could have used, in the event of ni being known and finite, as 
an estimate of p, is now meaningless. 

4.2. The Poisson Model. We are therefore faced with the 

6 4 
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task of so modifying our old model that we can circumvent 
difficulties of this kind. Our clue is this : if the number, .V»w, 
of occurrences of E is finite in N fixed intervals (of time, 
length, area, volume) and the N of the Binomial expansion 
(= A'm, here) is very, very large, />(= XmlXm = mini) must 
be very small. We ask, then, what happens to the Binomial 
distribution under the conditions that (1) « tends to infinity, 
but (2) np remains finite, and thus p is extremely small (i.e., 
the event is relatively rare) ? 

The probability-generating function of the Binomial distribu- 
tion is {q + pt)». Put np = m and, in accord with this. 
q = 1 — m/n. Then the p.g.f. becomes (1 + '"(' - l)/»0" a nd 
this (see Abbott, Teach Yourself Calculus, p. 127) tends to 
the limit e m <'- l > as n tends to infinity. Thus under the con- 
ditions set down, the probability-generating function of the new, 
limit-distribution, is «"*->). This new distribution is called 
the Poisson Distribution. We have 

e n«i-i) = «-m . e m = e- m (l + mf/1! 

+ mV/2! + • • • + m't'lr'. + . . .) 

The probability of exactly x occurrences of a statistically rare 
event in an interval of a stated length will then be the co- 
efficient of t' in this series, thus 

p(x. m) = e- m nflx I . . . (4-2.1) 

We note at once : 

(i) that it is theoretically possible for any number of events 
to occur in an interval ; and 

(ii) that the probability of either 0 or \ or 2 or . . . occurrences 
of the event in the interval is 

e- m (l + m/l I + m»/2 ! + ... + m'/r I + . . . 

_ e -m e m — i, as we should expect. 

4.3. Some Properties of the Poisson Distribution, (a) What 
exactlv does the m in (4.2.1) signify? Since we have derived 
the Poisson distribution from the Binomial by putting p = 
m/n and letting n tend to infinity, we shall obtain the Poisson 
Mean and the Poisson variance by operating in the same way 
on the Binomial mean, np, and the Binomial variance, npq. 
Thus we have 

Poisson Mean 

— Limit np = Limit n ■ m/n = m ■ . . . (4.3.1) 

c 
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Poisson Variance 
= Limit npg Limit n . m/n . (1 - m/n) = m . (4.3.2) 

Thus we see that the m in (4.2.1) is the value of both the mean 
and the variance of the new distribution. 

(b) Higher moments of the distribution may be worked out by 
using 3.5.2. Since G(t) = e<*>- '), it follows that the moment- 
generating function is given by 

M(t) = exp [m(ef - 1)] . . . (4.3.3) 
and the mean-moment generating function by 

Mm(l) = *m . exp [m(e> - 1)J . . (4.3.4) 
We have, for example. 

Ht* " *H»» = [£ exp - !})]_ 

- Uj< af W*»^3,_. = [-"(')"'"' + hwW'(/)]i-o 
= mAf (0) + mM'(0) = m(m + 1). 

(c) To case the work of calculating Poisson probabilities, we 
note that : 

p{x + 1, m) = r" . m*+«/(* + l)! = J* • Pi*, «fl (4.3.5) 
where />(0) = <r»'. For convenience we give the following table : 



Tablk 4.3. Values of r" 



m. 




m. 


<-". 


ro. 


«-". 


din 


0-91100 


01 


0-9048 


10 


0-3679 


01)2 


0-9802 


0-2 


0-8187 


20 


01 353 


003 


0-9704 


0-3 


0-7408 


30 


00498 


004 


0-9608 


0-4 


0(17(13 


4-0 


00183 


003 


0-9512 


0-5 


o-oou.-, 


5-0 


00067 1 


006 


0-9418 


0(1 


0-5488 


00 


00025 1 


007 


0-9324 


0-7 


0-4900 


70 


00009 


008 


0-9231 


0-8 


0-4493 


80 


00003 


009 


0-9139 


0-9 


0-4006 


90 


00001 



Note : Since «-(*+»*■> = «-* . „-» . r* we see that, if we Km to use 
this table, «-»«. for example. = 0 0007 >, 0-8187 x 0-9704 - 0 0053 
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Id) Fig. 4.3. shows probability polygons for m =01 and 
mi -= 30. It will be seen that when m< 1, the polygon is 
positively i-shaped, but, when m > 1, it becomes positively 
skew, tending towards symmetry as m assumes larger and 



1 




0 1 2 3 4 5 

1-IG. 4.3. — l'oisson I'robability Polygons. 



larger values. From (4.3.5), it follows that p(x) increases with 
*, for m > li while x <m — \ and, thereafter, decreases. 

4.4. Worked Examples 

1 . Consider the proof-correcting problem discussed at the begin- 
ning of this chapter. m. the mean, is here 0-4. Consequently using 
Table 4.3 and formula (4.3.5), the probabilities of 0. 1. 2. 3 . . . 
errors per page based on the 50 pages sampled are : 



X 


0 


1 


2 


3 


4 


P(x) 


0-6703 


0-2081 


00536 


00071 


00007 



Thus, in 100 pages, we should expect 67 pages with 0 errors. 27 
pages with 1 error, 5 pages with 2 errors and 1 page with 3 errors. 

2. This is a classical example, but sufficiently entertaining to 
bear repetition. 

Bortkewitsch collected data on the number of deaths from 
kicks from a horse in 10 Prussian Army Corps over a period 
of 20 years. It is assumed that relevant conditions remained 
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sufficiently stable over this period for the probability of being 
kicked to death to remain constant. 
His figures are : 



Actual deaths per 
corps 


0 


1 


o 


3 


4 


5 


Total 


Observed frequency 


109 


86 


22 


3 


1 


0 


200 



The mean of the sample, m is 

1 X 05 + 2 x 22 + 3x3 + 4x1 
200 



001. 



Using the Poisson model with m = 0 01. the estimated frequency 
of * deaths per Corps in 200 corps-years will be given by 

/(*) = 200«"»«' . (0-61)*/* I 

Using Table 4.3. 

^(0) = e-o-"' = 0-5433 and /(0) = 108-66 
Using (4-3.5), 



p(2) = 
PW = 
PW = 

m = 







0-61 


> 0-5433 




1 


0-61 


0-3314 




2 


(Mil 


x 01011 




3 


0-01 


X 0 0206 




4 


0-01 


X 0 0031 


5 



0 3314 


and 


/(I) 


= 06-28 


01011 


and 


/(2) 


= 20-22 


00206 


and 


/(3) 


= 412 


00031 


and 


/(*) 


= 0-62 


00004 


and 


/(5) 


= 0-008 



The " fit " is good. 

4.5. Approximation to Binomial Distribution. Being the 
limit of the Binomial distribution, when p becomes very small 
(the event is rare) and n tends to infinity, the I'oisson distribu- 
tion may be expected to provide a useful approximation to 
such a Binomial distribution. Moreover, it is much easier to 

calculate than it is to calculate p*q n -*. 

Suppose we have a consignment of 1,000 cartons, each carton 
containing 100 electric light bulbs. Sampling reveals an 
average of 1 bulb per 100 defective. 
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If we use the Binomial model, with p — fits. ? = voo and 
w = 100, probability of x defectives in 100 bulbs will be 
given by 

Pmto = ( , 2°)(l/100)*(99/100)"'°-' 

and p laa {x + 1) = jjg£ ^ • £m(*) 

Since the occurrence of a defective bulb is a rare event, we 
may use the Poisson model. In this case M = np = 1 and so 

p{x,\)=e'!x\ while p(x + 1) lp(x) = l/(* + 1). 

The following table results : 



No. defectives 
per 100 . 


0 


1- 


2 


3 


4 


5 


6 


Binomial model . 


36-64 


37 01 


18-51 


6-11 


1-49 


0-29 


005 


Poisson model . 


30-78 


30-79 


18-40 


613 


1-53 


0-31 


005 



The reader should check these figures as an exercise. 

4.6. The Poisson Probability Chart. As we saw when discus- 
sing Binomial probabilities, we frequently require to know the 
probability of so many or more occurrences of an event. It 
follows from (4.2.1) that the probability of k or more events in 
any interval, when the mean number of occurrences in a 
sample set of such intervals is m, will be given by 

P(x 5j *, m) = £ r*«*/*l . . (4.6.1) 

x-t 

To avoid having to calculate successive terms of this series, 
we use the Poisson Probability Chart of Fig. 4.6. On the 
horizontal axis are values of m ; across the chart arc a series 
of curves corresponding to values of A = 1, 2, 3, . . . ; while 
along the vertical axis are values of P(x ^ k, m). 

In the case considered in 4.5, m = 1. If we want to find the 
probability that a given batch of 100 bulbs shall contain 2 or 
more defectives, we run our eye up the line m = 1 until it 
intersects the curve k = 2 ; then, moving horizontally to the 
left, we find the required probability marked on the vertical 




a do samvA 
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axis — in this case 0-26. This means that of the 1,000 batches 
of 100 bulbs about 260 will contain 2 or more defectives. 

We could have arrived at this result by recalling that the 
probability of 2 or more defectives is 

P(*>8, 1) -1- (#0) + p(l)) 

= 1 - (0-3079 + 0-3079) 
= 0-2042. 

We may also use the chart to find approximately the prob- 
ability of an exact number of defectives, 2, say. We have 
already used the chart to find P(*>2, 1) - 0-20. In a 
similar way we find that P(*>8, 1) = 0 08. approximately. 
Therefore, p{2) = 0-20 - 0 08 = 0 18, approximately. The 
calculated value is, as we have seen. 0-1840. 

4.7. The Negative Binomial Distribution. We have derived 
the Poisson distribution from the Binomial, and a necessary 
condition for the Binomial distribution to hold is that the 
probability, p. of an event S shall remain constant for all 
occurrences of its context-event C. Thus this condition must 
also hold for the Poisson distribution. But it does not follow 
that, if a set of observed frequencies is fairly closely approxi- 
mated by some Poisson series, p is in fact constant, although, 
in certain circumstances, this may be a not unreasonable 
inference. If, however, it is known that p is not constant in 
its context C, another distribution, known as the Negative 
Binomial distribution, may provide an even closer " fit*. 

Suppose we have a Binomial distribution for which the 
variance, npq, is greater than the mean, np. Then q must be 
greater than 1, and since p 1 — q,p must be negative. But 
np being positive, n must be negative also. Writing n = — N 
and p = — P, the p.g.f. for such a distribution wili be 

G(<) ■ [q - PI) x 

The trouble about this type of distribution lies in the inter- 
pretation, for we have defined probability in such a way that 
its measure must always be a number lying between 0 and 1 
and, so, essentially positive. Again, since n is the number of 
context-events, how can it possibly be negative ? 

Any detailed discussion of this problem is beyond our scope, 
but the following points may be noted : 

(1) It is often found that observed frequency distributions 
are represented by negative binomials and in some cases 
that this should be the case can be theoretically justified 
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(G. U. Yule, Journal of the Royal Statistical Society, vol. 
73. pp. 20 el seq.). 

(2) In many cases, if two or more Poisson series are com- 
bined term by term, a negative binomial results. 

(3) Frequency distributions with variance greater than the 
mean often arise when the probability p does not remain 
constant. 

We conclude this section with an example of a kind fairly 
common in bacteriology, where, although a Poisson distribu- 
tion might reasonably be expected to hold, a negative binomial 
gives a better fit. 

The following table gives the number of yeast cells in 400 
squares of a hemacytometer : 



Number of cells 


0 


1 


2 


3 


4 


5 


Total 


Frequency 


213 


128 


37 


18 


3 


1 


400 



The mean is 0-68 and the variance 0-81, both correct to two 
decimal places. Putting np = 0-68 and npq — 0-81, we have 
q = 119 and, so, p = — 019 and n = — 3-59. The p.g.f. is 
thus (119 - 019/)- 3W . Hence 

Calculating these probabilities and comparing them with those 
obtained from a Poisson model with M tm 0-68. we have 



No. of cells 


0 


1 


2 


3 


4 


5 


Observed frequency . 


213 


128 


37 


18 


3 


1 


Negative binomial 


214 


123 


45 


13 


4 


1 


Poisson 


203 


138 


47 


11 


2 


0 
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EXERCISES ON CHAPTER POUR 

1. Rutherford and Geiger counted the number of alpha-particles 
emitted from a disc in 2,008 periods of 7-5 seconds duration. The 
frequencies are given below : 



Number per 
period. 


Frequency. 


Number per 
period. 


Frequency. 


0 


57 


8 


45 


1 


203 


9 


27 


2 


383 


10 


10 


3 


525 


11 


4 


4 


532 


12 


2 


5 


408 


13 


0 


0 


273 


14 


0 


7 


13!) 







Show that the mean of the distribution is 3-870 and compare the 
relative frequencies with the corresponding probabilities of the 
" fitted " Poisson distribution. 

2. If on the average »i particles are emitted from a piece of 
radioactive material in I second, what is the probability that there 
will be a lapse of / seconds between 2 consecutive emissions ? 

3. A car-hire firm has two cars, which it hires out by the day. 
The number of demands for a car on each day is distributed as a 
Poisson distribution with mean I -5. Calculate the proportion of 
days on which neither of the cars is used, and the proportion of days 
on' which some demand is refused. If each car is used an equal 
amount, on what proportion of days is a given one of the cars not in 
use? What proportion of demands has to be refused? (R.S.S.) 

4. Show that the sum of two Poisson variatcs is itself a Poisson 
variatc with mean equal to the sum of the separate means. 

5. Pearson and Morel {Ann. lingenics. Vol. 1, 1925) give the follow- 
ing table showing the number of boys at given ages possessing 
0, 1, 2. 3 . . . defective teeth: 



Number 
of teeth 
affected. 


Central ages (years). 


Total. 


7ft 


8tt 


io A 


lift 


13ft 


0 


12 


10 


27 


01 


07 


183 


1 


4 


14 


13 


47 


09 


147 


2 


0 


23 


28 


43 


50 


150 


3 


4 


11 


20 


35 


41 


111 
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Number 




Centra! ages (years). 






of tcelh 












i oiai. 














n (TnKtml 
• 1 UCCEBUl 


Ht 


S5; 


I0A 


Ill* 


13/, 




4 


7 


21 




28 


22 


92 


5 


5 


IS 


" 7 


15 


10 


52 


0 


3 


16 


7 


20 


8 


54 


7 


4 


5 


3 


6 


2 


19 


8 


4 


1 1 


5 


5 


7 


32 


0 


1 


a 




2 


3 


12 


10 


1 


4 


1 


1 


1 


8 


11 




2 




2 






12 


1 


1 




1 


2 


5 


Totals 


52 


145 


12.". 


205 


282 


809 



Estimate the probability of a defective tooth for each age group. 
Plot this probability against age. Fit a negative binomial to each 
age group and also to the total. Fit a Poisson distribution to any 
group and to the total. Comment on your findings. (E.S.E.l) 



Solutions 

2. Complete Solution : If there are on the average m emissions in 
1 second, there will be mSt on the average in St seconds. The 
probability of no emissions in 8/ will then be exp ( — mbt) and that 
of 1 emission e'^'mSl. Therefore the probability of 0 emissions in 
ti intervals of St and of 1 in the next such interval will lie 

Sp — exp (— mnSt) exp ( - mSt)>itSt. 

Let 

nSl — >■ / as n — >- * and St — >• 0, and we have ^ = mc* 

at 

or dp — me"' dt. 

3. 0-223: 0 191 ; 0-390; 01 87. 

4. We use the fact (see Chapter Two) that the generating function 
of the sum of two independent variates is the product of the generat- 
ing functions of the two variates. The m.g.f. of a Poisson variate 
with mean m, is exp (»»,(/ — 1)); that of a Poisson variate with 
mean m, is exp (»«,(/ — 1)). Hence the m.g.f. of the sum of these 
two variates is exp [m, + m,)(t — 1), i.e.. that of a Poisson variate 
whose mean is the sum of the separate means. 



CHAPTER FIVE 



STATISTICAL MODELS 
III : THE NORMAL DISTRIBUTION 

5.1. Continuous Distributions. So far our model distribu- 
tions have been those of a discrete variate. Put rather crudely : 
up till now we have been concerned with the distribution of 
" countables ". Now we must consider distributions of what, 
equally crudely, we may call " measurables ", or continuous 
variates. 

Table 5.1 shows the distribution of heights of National 
Servicemen born in 1933 (mostly aged about 18 years 3 
months). 

Table 5.1 

(From " Heights and Weights of the Army Intake, 1951 ", by 
S. Rosenbaum (Directorate of Army Health), Journal oj the Jloyal 
Statistical Society, Series A. vol. 117, Part 3, 1954.) 



Height (in.). 


Number. 


59 and under 


23 


60- 


169 


61- 


439 


62- 


1.030 


63- 


2.110 


64- 


3.947 


65- 


5.965 


66- 


8.012 


67- 


9.089 


68- 


8.763 


69- 


7.132 


70- 


5.314 


71- 


3.320 


72- 


1.884 


73- 


876 


74- 


383 


75- 


153 


76- 


63 


77 and over 


25 




Total 58.703 



The heights, we are told, were only taken to whole inches, the 
67-in. class representing heights from 66Jin. to 67 J in. 
• 75 
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Smaller class intervals could, of course, have been taken. In 
such a case, some of the classes might well have been null or 
empty. But, had the total number of men been indefinitely 
increased, there is theoretically no reason why any of these 
classes should have been null. By increasing the size of our 
sample and reducing the class interval, we make the steps of 
our histogram narrower and shallower. 

The fact remains, however, that all measurement is approxi- 
mate. We never, in fact, measure the " true value " of any 
quantity. No matter how fine our unit of measurement, the 
most we can ever say is that " the height " of a certain man. 
for instance, lies within some specified interval. The smaller 
our unit, the smaller, of course, that interval will be, but there 
will always be an interval. In practice, of course, we cannot 
reduce the interval indefinitely, for, below a certain limit, that 
which we are attempting to " measure " no longer exhibits 
" definite " boundaries. If, however, we idealise the situation 
and ignore the existence of such a lower limit, then we can 
conceive of a position such that, no matter how small our 
interval, there will always be at least one value of our variate, 
whatever it is, lying within that interval. For instance, we 
can conceive of an infinite population of heights such that, no 
matter how small we chose our class intervals, there will always 
be at least one height lying within each interval. We then say 
that our variate (height in this case) varies continuously, and 
we come to the idea of a continuous distribution, where the 
relative frequency of the variate varies continuously as the 
variate itself varies continuously over its range. 

Such a distribution is essentially ideal, but we may regard 
any actual finite sample of measured quantities as a sample from 
such an infinite, continuous parent population of measurable 
items. 

One of the most important continuous distributions is that 
to which the distribution of heights in Table 5.1 approximates. 
It is called the Xormal distribution. In his book The Advanced 
Theory of Statistics, M. G. Kendall has commented as follows : 

" The normal distribution has had a curious history. 
It was first discovered by de Moivre in 1753 as the limiting 
form of the binomial, but was apparently forgotten and 
rediscovered later in the eighteenth century by workers 
engaged in investigating the theory of probability and the 
theory of errors. The discovery that errors ought, on 
certain plausible hypotheses, to be distributed normally 
led to a general belief that they were so distributed. . . . 
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Vestiges of this dogma are still found in textbooks. It 
was found in the latter half of the nineteenth century 
that the frequency distributions occurring in practice are 
rarely of the normal type and it seemed that the normal 
distribution was due to be discarded as a representation 
of natural phenomena. But as the importance of the 
distribution declined in the observational sphere it grew 
in the theoretical, particularly in the theory of sampling. 
It is in fact found that many of the distributions arising 
in that theory are either normal or sufficiently close to 
normality to permit satisfactory approximations by the 
use of the normal distribution. . . . For these and other 
reasons ... the normal distribution is pre-eminent among 
distributions of statistical theory " (vol. I, pp. 131-2). 

5.2. From the Binomial to the Normal Distribution. Consider 
two adjacent cells of a Binomial relative-frequency histogram 
(Fig. 5.2). We take the equal class-intervals to be unit 
intervals and the mid-point of each interval to correspond to 
the appropriate value of the variate. The relative-frequency 
of any one value of the variate is then represented by the area 
of the corresponding cell. Let the height of the cell corre- 
sponding to the value x of the variate be y r . Then, since the 
class-intervals are unit intervals, the relative-frequency of the 
value x of our variate will be y x . When the value of the variate 
changes from x to * -f- 1, the corresponding increase in the 
relative frequency is y x+ , — f» Denoting the increase in y t 
corresponding to an increase of A* (= 1) by Ay z , we may write 

^=^(yx+,/yx- i) 

The relative-frequency of the value x of a binomial variate is 
precisely the probability of exactly x occurrences of an event 
E in n occurrences of C(E. G). i.e., y x = p„{x). Consequently. 

^ _ „ + S - ll 

A* y *L PA*) J 

and. using (3.3.1), 
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The Binomial mean is np, and if we transfer our origin of co- 
ordinates to the mean value of our variate, our new variate 
will be X = x — np. Thus we may put x => X + np, where 
X is the deviation of the original variate from its mean value. 



Y»Y(X) 




Kio. 5.2. 



Moreover. p„{X) = />„(*) and so y x = y z , while AA' being still 
•inity is equal to A*. Therefore. 
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But when n is large and p is not of the order both £ and 1 
arc small compared with np. 

Thus A y.v/AA-y.v[ f( ^^-J] 

= ;y.r(l/(l 4- Xlnp)q - Mq) 

Xow let V = Y{X) be the equation of a suitable continuous 
curve fitted to the vertices of the Binomial relative frequency 
polygon. Then we may put 

rfi^AY _ Ayr ^ T • 11 

dX AA" — AA' yK lq(l + Xlnp) qJ 

Integrating from X = 0 to A" = A' (see Abbott, Teach Yourself 
Calculus, p. 158), 

log,y - 26 log,(l + Xlnp) - Xlq + log,Y 0 . (5.2.4) 

where V 0 is the value of Y at X = 0. i.e.. the value of Y at 
the mean of the distribution. 

Over the greater portion of the curve Y = Y(X), X will be 
small compared with Hp. Since we are approximating, anyway, 
we may. therefore, expand log,(l + Xlnp) in terms of Xlnp 
(see Abbott, op. at., p. 332) and neglect powers of Xlnp greater 
than the second ; thus 

log,(l + Xlnp) *sXlnp - A»/2m«/>* 

(5.2.4) then becomes 

log,(Y/y„) ^ [Xlnp - A'/2««p«) -Xlq = - X*l2npq 

or Y ST, exp (— X 1 l2npq) (5.2.5) 

Let us be quite sure what this equation says. For positive 
and negative integral values of X (X = 0, ±1, ±2, ±3 . . .) 
it gives us the approximate relative-frequency, or probability. 
Y&X = Y (•.• AA' = 1). with which the variate assumes the 
value X. But we have not yet reached the continuous distribu- 
tion to which the binomial distribution tends as n increases 
indefinitely and as the class intervals diminish indefinitely. 
For values of X not positive or negative integers (5.2.5) is 
meaningless. And. in any case, the formula, as it stands, is 
valueless, since we have not evaluated Y„. Clearly, Y„ is the 
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np\(n-nP)'. P 



probability that A* assumes the value 0, or, what is the same, 
that x assumes its mean value np. This, by (3.2.1), gives 

Hi 

np\(n — np)\ 

When N is large, we have an approximation, named after 
Stirling, for AM 

JV! =£* VtoiN . N* . «-- v . . . (5.2.7) 

Using this approximation for n!, np\ and nq\ in (5.2.6), the 
reader will verify that, with some easy simplification, this gives 

Y 0 ^U(2xnpq)i. 

Therefore, 

V ^?2lW ? eXP( -* W • " *** 
5.3. The Normal Probability Function. We now replace X 
in 5.2.8 by tnk. This has the effect of replacing the unit class- 
interval of the ^-distribution by one of n I for the new z- 
distribution. Thus, as n increases, the internal for z, corre- 
sponding to unit interval for A', diminishes. Furthermore, 
the probability that AT lies in a particular unit interval will be 
the probability that s lies in the r-interval corresponding to 

that particular A'-interval, and this is YAX = , exp 

( - X'/2npq) . AX. But AX = ni . Az, and so the probability of 

z lying in the interval Az is — pgmimi exp {—z*/2pq) . nlAz, i.e., 

TEST «*P 
■\/2r.pq 

Calling this probability Ap(z), we have 
or, as n tends to infinity. 

Now npq was the variance of the original Binomial distribution. 
Since we have put z = Xtrl, the variance of the new continuous 
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distribution of z will be pq. Calling this a* we have the 
probability density of the new distribution 

<f>(x) = exp (-x'/2a») . . (5.3.1) 

This is the Normal probability density function, defining the 
Normal distribution. 

5.4. Some Properties of the Normal Distribution. 

(a) The range of a normally distributed variate with zero 
mean and variance a* is from — co to + oo. 

(6) When the mean of the distribution is at x = jx, the p.d.f. 

is 

4,(x) = -)= exp (- (x - u)W) . (5.4.1) 
(c) The equation of the probability curve referred to its 




-3d -2a -1<5 0 +1<5 X +2G +-3C 



Fig. 5.4.1. — Normal Distribution. 



mean as origin, the continuous curve, to which the binomial 
distribution approximates when n is very large, is 

y = <p(x) 

this curve is symmetrical (since <f>{— x) = <f>{x)), unimodal 
(mode, medium and mean coincide) and such that y decreases 
rapidly as the numerical value of x increases. 

(d) It follows from the symmetry of the distribution that 
the mean-moments of odd order are all zero. Since <f>{x) is an 
even function of x, x*** 1 <f>{x) is an odd function and, con- 
sequently, 

f + x 
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(e) The mean-moment generating function is 
. + 00 



M m (t) m £(e«) = I eP<t>(x)dx 

a-4/r exp c- 2-^ <*■ - + ° , ' , » + fj* 

57S exp (io,,,) / exp [" K» ( * " ° ,,)S ] ^ 

— X 

exp (io*/«) j exp (- y«/2o«)rfv 



o 

where y = x — a*t. 



.-. MJt) = -~ exp (|o'/«) . oVS 1 



or Ua(fl - exp (JaV) . ■ ■ (5.4.2) 

Since the coefficients of all the odd powers of / in the 
expansion of this function are zero, we see again that the 
mean-moments of odd order are all zero. To find the mean- 
moments of even order, we first find the coefficient of t 1 '; this 
is (ia'YIr 1 ; then the coefficient of i*/2r ! is (ia') r . 2r !/»-! 

Hence = (i)'o*2r l/H = 1.8.6 . . . f> - Da* (5.4.3) 

In particular, (z, = o* and n, = 3a 4 . 

We also have a useful recurrence relation 

!V - Q - l)«Vr M . ■ . (5.4.4) 

1 By 3.C.3. ; alternatively, the probability that x shall take a 
value 'somewhere between — 00 and + » to the total area under 

the probability curve, i.e.. / — 73- exp (- *»/2c«)ir. Hut this 

is certain. Hence 

/ -4= exp ( - x*j2a*)dx = 1 

r + «> 



I 1 " 

/ exp ( - *»(2o»)rf* = oV^tt 
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(/) Since the area under the probability curve between 
x = 0 and * = X is the probability that the variatc x will 
assume some value between these two values, 

P(0 < * < X) = =£«p ( -xW)<fr • (&*.*) 
Now, referring to Fig. 5.4.1, we see that 

Bat the integral on the left-hand side of this equation gives the 
half the area under the entire curve and is, therefore, equal to 

V also — C exo (-x*l2a*)dx = P(0 < x < X) and 
—L- r exp ( - X*Jtd*)&X = P(x > X). 



Therefore 

P(x > X) = 1 - P[x < X) . . (5.4.6) 

If, however, we are concerned only with the absolute value of 
the variate. measured from its mean, then 

P(x > I X |) = 1 - '2P(x ^X). . (C.4.7) 

This is frequently the case. Consider the following, hypo- 
thetical, problem : 

In a factory producing ball-bearings a sample of each 
day's production is taken, and from this sample the mean 
diameter and the standard deviation of the day's output 
are estimated. The mean is in fact the specified diameter, 
say 0-5 in. ; the standard deviation is 0-0001 in. A bear- 
ing whose diameter falls outside the range 0-5 ± 0 0002 in. 
is considered substandard. What percentage of the day's 
output may be expected to be substandard ? 

This is what is generally called a two-tail problem, for we are 
concerned with the probability of a bearing having a diameter 
which deviates from the mean by more than 0-0002 in. above 
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the mean and of a bearing having a diameter which deviates 
by more than 0 0002 in. from the mean below the mean, i.e., 
we arc concerned with finding the two (in this case, equal) 
areas under the Normal curve between — co and — 0-0002 in. 
and between + 0-0002 in. and + oc. This will be given by 
1 - 2P(x < 0 0002). Mow, then, do we find probabilities 
such as P{x < 0 0002) ? 

Consider (5.4.5) again. Let us take the standard deviation 
of the distribution as the unit for a new variate and write 
t = xlo. Then (5.4.5) becomes 

P(0 < / s$ T) = exp (- /«/2)rf< . (5.4.8) 

where T = X/a. 



Table 5.4. Area under the Normal Curve: 
P(l < T) = exp (- /»/2)<// 



T[-XM. 


0. 


1. 


o 


3. 


4. 


6. 


e. 




8. 


9. 


0-0 


•0000 


■WHO 


•01.80 


"1311 


■0159 


•0199 


•033* 


■0279 


•0319 


•11369 


0-1 


•11398 


-0438 


■0478 


•0617 


■0667 


•0596 


■0036 


•0676 


0714 


•0763 


OS 


•0793 


•0832 


(1871 


•0910 


•0948 


•0987 


■1036 


•1(164 


•1103 


•1141 


o-s 


-1170 


1217 


■1266 


•ISM 


•1331 


•1368 


■1406 


•1113 


■1480 


•1417 


0-1 


•1684 


•1691 


•1028 


•1601 


•1700 


•1730 


■1773 


■ISos 


-1844 


■1879 


0-5 


-1916 


•I960 


•1986 


•2019 


-3064 


•■-•IPSs 


■2183 


•2167 


-8190 


■3234 


0-8 


•2257 


•S391 


•232 I 


•2367 


•S389 


•S43S 


•2484 


•2486 


•2618 


•36-19 


0-7 


•2580 


•SOU 


•S04S 


•2073 


•37114 


•3734 


•2761 


•2791 


•SSS3 


•2862 


0-8 


4881 


•S910 


■2939 


•2907 


•2995 


•3023 


•3051 


•3078 


•3106 


■3133 1 


0-9 


•316a 


■3180 


•3313 


-3238 


3201 


•3289 


•3316 


■3.140 


•3.106 


•3389 


1-0 


•3-4 IS 


•3438 


■3461 


■34s."> 


•3608 


-1831 


•3551 


■3677 


•3699 


•3631 


11 


•3«I3 


•3006 


•3080 


•3708 


•3729 


•ma 


•37711 


•3790 


■3810 


•3830 


IS 


■sua 


•3809 


•3888 


•3907 


•3926 


•3941 


•3902 


•3980 


■3997 


•4015 


1-3 


•1033 


•MM 


•4000 


•4082 


•0199 


-4118 


■4131 


•4117 


•4103 


•4177 


1-4 


■11 US 


•42"7 


•4223 


•4230 


■4251 


•4205 


•4279 


•4393 


4306 


■4319 


1-6 


•4333 


•4346 


•4367 


•1370 


•4382 


•4391 


■44H6 


•4418 


■4430 


•i 111 


1-6 


■4452 


•41 03 


•4474 


•4485 


•4496 


•4605 


•4615 


•4536 


•4636 


-4646 


1-7 


•4654 


•4604 


•4673 


•4682 


•4591 


•4599 


•101 IS 


•4010 


•4636 


•4633 , 


M 


•10.11 


•4049 


•4660 


•MM 


•4671 


•4078 


•4680 


4693 


•4099 


•4706 


l-a 


•4713 


■4719 


•4730 


•4732 


•4738 


•4744 


•4760 


•4760 


•4703 


•4767 


20 


•4773 


•4778 


•1783 


-1788 


•4793 


•4798 


•4803 


■4808 


■4812 


•4817 


SI 


•4821 


•48S0 


•4830 


-4834 


■4838 


•4842 


•4840 


■4880 


•4854 


•4857 


2-2 


■4801 


•4806 


■4SUS 


•is;i 


■4876 


•4878 


•48S1 


-4884 


•4887 


•4890 


3-3 


•4893 


•4S9H 


■4898 


•4901 


■4904 


•4906 


•49119 


-4911 


•4913 


■4910 


2-4 


-4918 


•49311 


•4933 


•4935 


-1937 


•4929 


•4931 


•4932 


•4934 


••r.i3i; 


2-6 


-4938 


•ll'lii 


•1911 


•4913 


•4946 


•4946 


•4948 


-4949 


-4961 


•4962 


2-6 


•4963 


•4966 


■4960 


•4967 


•4959 


-496" 


■4901 


■4963 


-4003 


■4961 


2-7 


•4966 


■4900 


■49G7 


•4908 


•4969 


•1970 


•4971 


•4971 


-4973 


•4974 


2-8 


•1974 


•4976 


■4970 


■4977 


•4977 


•4978 


•1979 


•4980 


■1980 


•4981 


2-» 


■4981 


-4982 


■4983 


■4983 


•4984 


•4984 


•1985 


•49S6 


4980 


•4960 




30 


31 


3-2 


3-3 


3-4 


:!•.-. 


3-6 


8-7 


3-8 


3-9 , 




•4987 


•4990 


•4993 


•4995 


■4997 


•4998 


■1998 


•4999 


■4999 


•511(101 
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Now P(t < T) is the area under the curve y = ^L= . exp 
(- 1*12) between / = 0 and / = T. The integral 

w/ 0 expi ~ lV2)dt 

frequently called the probability integral, is a function of T. 
It cannot, however, be evaluated in finite form, but if we 
expand the integrand and integrate term by term, the integral 
can be computed to any degree of accuracy required. Table 
5.4 gives values of this integral for 7* = 0 to T = 3-9, correct 
to four decimal places. 

j ,0-600 

Worked Example : Find the value of -j= J exp (— P\2)dt correct 

to four decimal places using four terms of the expansion of the 
integrand. 
We have 

exp ( - /»/2) = 1 - P/2 + t'lH '■ - <*/8.3 I + . . . 
Therefore 

-tL ( exp (- I'l2)dt = (T - T>/6 + T»/40 - T'/336 . . .) 
V2jry D V2ir 

Now T = 0-50000; T* — 0-12500 and r»/6 - 0 02083 : T* = 
0-03125 and 7"*/40 = 0 00078; T> = 0 00781 and 7"'/336 = 0-00002 

_ ,0.600 

Taking \\\/2ir = 0-39894. we have — , j exp ( - I*l2)dt m 

V2nJ Q 

0-1914. which should be compared with that of 0-1915 given in 
Table 5.4. This method is satisfactory when T ^ 1, but, for larger 
values, the successive terms of the expansion diminish too slowly 
and, to overcome this, we use a modified method : — 

~J exp fr- f>IW- ^ J exp (-/»/*)* 

i t* i c i 

~VTvj CJt P(- /, / 2 ) rf ' = °- 5 -7|;j ', . exp I- t'l2)tdt 
and, integrating successively by parts, we have, for T > 1, 

PI — 1/2*+ 1, 3/T* -1.3. 5/7« + 1 . 3 . 5 . 7/T» - . . .] 
where t/VS = 0-39894228. 



SO 
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Exercise : Imd -jjm / exp (- Pffldt correct to Jour decimal places 
and cluck the result with the value given in Table 5.4. 

We may now return to our ball-bearing problem. The 
standard deviation of the sample was 0-000 1 in. Since we have 
to find P{.x > 0-0002), X — 0 0002 and T = 0-0002/0-0001 = 
2. Hence the probability that the diameter of a bearing will 
lie between 0-6 and 0-5002 in. is 0-4772. Therefore the prob- 
ability that the diameter will exceed 0-5002 in. is 0-5 — 
0-4772 = 0-0228. Since the Normal distribution is symmetri- 
cal, the probability of a bearing with a diameter less than 
0-4998 in. will also be 0 0228. Hence the probability that the 
diameter of a bearing will lie outside the tolerance limits will 
be 0 0456. This means that we should expect, on the data 
available, just over 4J% of the bearings produced on the day 
in question to be substandard. 

(g) If we pick at random a value of a variate known to be 
distributed normally about icro-mcan wil/i variance a*, what is 
the probability that this random value will deviate by more than 
a. 2a. 3o, from the mean? 

Entering Table 5.4 at T = 1 00. we find that the area between 
the mean ordinate and that for T = 100 is 0-3413. This is 
the probability that the random value of the variate will lie 
between 0 and o. By symmetry, the probability that it will 
lie between 0 and - o is also 0-3413. Thus the probability of 
it lying between — a and + a is 2 x 0-3413 =-- 0-6826. Con- 
sequently the probability that it will deviate from the mean 
by more than a in either direction is 1 — 0-6826 = 0-3174. or 
less than ^. 

Similarly, the probability that a random value will lie 
between — 2o and - 2o is 0-9544 ; this means that the prob- 
ability that it will lie outside this range is 0 0456, or only about 
4J% of a normally distributed population deviate from the 
mean by more than 2o. 

Likewise, as the reader may ascertain for himself, the 
probability of a deviation greater than 3o is only 0 0027. 

(A) Suppose now that we were to plot values of the integral 

1 ( x 
W2%) 6XP ( ~ xl ' 2a ^ dx 

against the value of X. This is not possible over the full 
range, -co to + oo, but since we have just found that 
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deviations of more than 3o from the mean are very rare, we 
can confine our attention to that section of the range lying 
between — 3-9o and + 3-9o. the range covered by Tabic 5.4. 
If we do this we obtain a cumulative probability curve for the 
Normal distribution. The function F{X) defined by 

= — L-/"' r exp(-*«/2*V*) . (5.4.9) 

* — 00 

is called the Normal distribution function. Clearly 

F(X) = i + P(0 < x < X) . (5.4.10) 

The graph of a typical F{X) is shown in Fig. 5.4.2. From 
(5.4.9) it follows that the value of the ordinate of this graph 
at X is equal to the area under the curve of the probability 




Fig. 5.4.2. — Curve of Normal Distribution Function. 



density function <ft{x) from — 00 to X. If, however, we plot 
F{X) against X on probability graph paper (Fig. 5.4.3) the 
resultant cumulative probability curve is a straight line. There 
is nothing strange in this, for probability paper is deliberately 
designed to ensure that this will happen ! 

5.5. Binomial, Poisson, Normal. When n is large the 
Binomial distribution tends towards the Normal distribution 
with mean at the Binomial mean value and variance equal to 
that of the discrete distribution. Furthermore, as wc have 
also seen, when the mean of the Poisson distribution — also a 
discrete distribution — is very large, the Poisson probability 
polygon tends towards symmetry. In fact, when m is large, 
the Poisson distribution also tends to normality. 




Fig. 5.4.3.— Cumulative Probability Curves on Normal 
Probability Paper. 



Worked Example : Use the Normal distribution to find approximately 
the frequency of exactly 6 successes in 100 trials, the probability of 
a success in each trial being p — 0-1. 

The mean of the binomial distribution is Hp m 10 and the vari- 
ance, npq, = 9. The standard deviation is, therefore, 3. The 
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binomial class-interval for 5 will then correspond to the interval 
— 4-5 to — 5-0 of the normal distribution (referred to its mean as 
origin) and dividing by a = 3, this, in standardised units, is — 1-50 
to — 1-83. Owing to the symmetry of the normal curve, we may 
disregard the negative signs, and entering Table 5-4 at 1-50 and 
1-83. we read 0-4332 and 0-4604 respectively. Hence the probability 
of 5 successes is approximately 0-4064 — 0-4332 = 0 0332. Thus 
in 100 trials the frequency of 5 successes will be approximately 3-32. 
The reader should verify for himself that direct calculation of the 
binomial frequency gives 3-31), while the frequency obtained from 
the Poisson series with m = 10 gives 3-78. 

5.6. Three Examples. We conclude this chapter with two 
other typical problems in the treatment of which we make use 
of some of the properties of the Normal distribution. The 
reader should work each step himself, following carefully the 
directives given. 

Example 1 : To fit a Sormal curve to the distribution given in Table 
51. 

First Treatment : (1) Draw the frequency histogram for the data. 
The height 07 in. represents heights between 00J and 67J in. 

(2) Calculate the Mean and Standard Deviation, correcting for 
grouping (2.15) the value of the latter. It will be found that the 
mean is 07-852 in., and the standard deviation 2.60 in. 

(3) The normal curve corresponding to these values for the mean 
and standard deviation is drawn using Table 5.5 on page HO. 
Tor each class-interval, work out the deviation, from the mean, of 
each of the boundary values. For instance, the boundary values 
for the interval " 69 in." are 08-975 and 69-975. The respective 
deviations from the mean are 1-123 and 2-123. Making the trans- 
formation T m A'/o, we divide each of these by 2-60, obtaining, 
correct to three decimal places, 0-432 and 0-817. Now Table 5.5 
gives the values of a times the ordinate of the normal curve, y = $(x). 
But since the total frequency of the distribution is 58.703 and the 
area under the normal curve is unity, each of the two values read 
Irom Table 5.5 must be multiplied by the factor 58.703/2.00 = 22.578. 
Kntering Table 5-5 at 0-432 and interpolating, we have 0-3034, 
and at 0-817. 0-2833. The ordinates corresponding to the end 
points of the interval " 69 in." for the fitted normal frequency curve 
are thus 22.57S x 0-3034 8205. and 22.578 x 0-2833 - 0390. 
Proceeding in this way, we plot the required curve. 

(4) We now calculate the theoretical frequency for each class- 
interval in order to compare it with the corresponding observed 
frequency. This time we use Table 5-4. The area under the 
normal curve from the mean to the right-hand class-boundary 
(0-817) is, from this table, 0-2930; the area from the mean to the 
left-hand boundary (0-432) is 01 071. The difference between these 
two values multiplied by the total frequency, 0-1259 x 58.703 = 
7,391. is our theoretical frequency for this interval. The corre- 
sponding observed frequency, note, is 7,132. The reader should 
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Table 5.5. Ordinate* of Normal Curve Multiplied by 
Standard Deviation 



Divide each value by a to obtain v — 
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complete the calculations and draw the theoretical frequency 
polygon. 

Second Treatment : (1) Draw the cumulative frequency polygon for 
Uie data of Table 6.1. 

(2) Draw the theoretical cumulative normal frequency curve 
with mean 67*868 and s.d. 2-00. This is done using Table 5.4 as 
follows : 

To find the ordinate of the cumulative frequency curve at, say, 
the lower end-point of the interval " (14 in.." i.e., at 08*976 in., we 
have to find the area under the normal curve from — cc to X = 
63-975. But this is J — (area under curve between mean, 07-852. 
and the ordinate at 63-075). The deviation from the mean is 
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— 3-877. But the area under the normal curve between X = 0 
and X = — 3-877 is, by symmetry, the area under the curve 
between X = 0 and X = 4- 3-877. Dividing 3-877 by 2-60, we 
obtain 1-491 : entering Table 5-4 at this value, we read 0-4319. 
The required ordinate of the cumulative frequency curve is then 
given by 58,703 x (0-5000 - 0-4319) = 3.997. The reader should 
calculate the other ordinates in a similar manner and complete the 
curve. 

(3) If now we mark upon the vertical axis percentage cumulative 
frequencies (with 58.703 as 100%). we can find the position of the 
median and other percentiles. (Median : 67-7 in. ; quartiles : 66-0 
and 69-5 in.; deciles : 64-4 and 71-1 in.) 

Example 2: To find, using probability graph paper, approximate 
values for the mean and standard deviation of an observed frequency 
distribution which is approximately normal. 

Treatment : When plotted on probability graph paper, the 
cumulative frequency curve of a normal distribution is a straight 
line. If, then, we draw the cumulative relative-frequency polygon 
of an observed distribution on such paper and find that it is approxi- 
mately a straight line, we may assume that the distribution is 
approximately normal. We next draw the straight line to which 
the polygon appears to approximate. Then, working with this 
" filled line : 

(a) since, for the normal distribution, mean and median coincide 
and the median is the 50th percentile, if we find the 50th percentile, 
we shall have a graphical estimate of the mean of the observed 
distribution ; 

(6) the area under the normal curve between — ce and n •( o is 
0-5000 -f 0-3413 = 0-8413. Thus 8413% of the area under the 
normal curve lies to the left of the ordinate at p + o. So the 84*19 
percentile corresponds to a deviation of + a from the mean. If, 
therefore, from the filled cumulative frequency line we find the 
position of the 84th percentile, the difference lietween this and the 
mean will give us an estimate of a for the observed distribution. 
Likewise, the difference between the 16th percentile and the median 
will also be an estimate of a. 

Example 3: The frequency distribution f(x) in obtained from tlie 

normal distribution N(l) = -4_-. exp(— {!*), by means of the 
equations V 2n 

(i) j f(x)dx = j \'{t)dt. and (ii) / = a log {x - 1). 

1 — 10 

If exp (1 la*) — 4, show thai the median of f{x) is 2, the mean is 3 
and the mode is 1-25. (L.U.) 

Treatment : As x — > 1, / — > — co ; as * — >- + «> . I -> + oo . 
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Alsof /Wdx=J iV(/)rf/=l. Hence the median value of /(*) 

1 —CO 

is given by / f(x)dx = \ = \ N(t)dt, i.e., by 0 m a log {x - 1), 
i.e., since log 1 = 0, x = 2 . 

* = / .r/Mrf*. But a- = 1 + 

Hence 

* = J (1 + c'*)N(t)dl = 1 4- J e" 1 (*~ D dt 

— 00 — CD 

- 1 + 5s£." "*[-*('-!)> 

I*i ;? = 1 + «'/•' = 1 + (4)1 = 3. 

Differentiating (i) with respect to I, we have 

or /(*) = ae ->i-S(l) = ^exp [-i (/» + J) J 

••• ^ = 5 [v'k exp *- + - - 

= *T n e '" ■ e *P f- if + 2 '/«>l • ~ [' + JJ 
If then = 0. which defines the modal value, we must have 

a 

Thus l-Urj-1 or x = 1-25. 



EXF.KCISES ON CHAPTER FIVE 

1. Fit a normal curve to the distribution of lengths of metal bars 
given in 2.1. 

2. The wages of 1,000 employees range from 4s. Od. to 19s. 6rf. 
They arc grouped in 15 classes with a common class interval of Is. 
The class frequencies, from the lowest class to the highest, are 6. 17, 
35, 48, 05, 90, 131. 173, 155, 117, 75. 52. 21. 9. 0. Show that the 
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mean wage is 12-006s. and the standard deviation 2-020s. Pit a 
normal distribution, showing that the class frequencies per thousand 
of the normal distribution are approximately 8-7. 11-3, 25-0. 48-0, 
79 0, 113 1. 140-5, 151 0. 140-8. 113-5. 79-5, 48 1, 25-3. 1 1-5 and 0-7. 

(Weatherburn, Mathematical Statistics.) 

3. A machine makes electrical resistors having a mean resistance 
of 50 ohms with a standard deviation of 2 ohms. Assuming the 
distribution of values to be normal, find what tolerance limits 
should be put on the resistance to secure that no more than ,^ 00 
of the resistors will fail to meet the tolerances. (K.S.S.) 

4. Number of individual incomes in different ranges of net income 
assessed in 1945-40 : 



Range of Income 
after tax (*). 

£ 

150-500 

500-1.000 
1.000-2,000 
2.000 and over 



Number of 
Incomes. 

13.175.000 
652.000 
137.500 
35,500 



Total 14,000.000 



Assume that this distribution of incomes, /(*), is linked with the 
normal distribution 

N(l) = -1= exp(-i<«) 

by the relationship 

( .\'(t)dl = ( f(x)dx, where / = a log (x - 1501 + b. 

— a 160 

Obtain estimates for a and b from the data, and find the number of 
incomes between /250 and ,£500. 

6. Show that |3 2 for a normal distribution is equal to 3. 

fl. If p = use the normal distribution to estimate the prob- 
ability of obtaining less than five or more than 15 successes in 50 
trials. What is the actual probability ? 

Solutions 

3. ± 0 0 ohms. 4. See Example 3 of 5.6. a - 0-70, 

b = 2-49; number of incomes 
between /250 and £500 is 
2-4 x 10*, to two significant 
figures. 

6. 0 0519; 0 0503. 



CHAPTER SIX 



MOKE VARTATES THAN ONE: BIVARIATE 
DISTRIBUTIONS. REGRESSION AND CORRELATION 

6.1. Two Variates. In the last chapter we discussed the 
distribution of height among 58,703 National Servicemen born 
in 1933 who entered the Army in 1851. We could have 
discussed the distribution of weight among them. In either 
case, the distribution would have been univariate, the distribu- 
tion of one measurable characteristic of the population or 
sample. But had we considered the distribution of both height 
and weight, a joint distribution of two variates, we should have 
had a bivariale distribution. 

6.2. Correlation Tables. How do we tabulate such a distribu- 
tion ? To each National Serviceman in the total of 58,703, 
there corresponds a pair of numbers, his weight, x lb. say, and 
his height, y in. Let us group the heights in 2-in. intervals 
and the weights in 10-lb. intervals. Some men will be classed 
together in the same weight-group (call it the group) but 
will be in different height-groups; others will occupy the 
same height-group (the V) group, say) but different weight 
groups; but there will be some in the same weight-group and 
the same height-group, the group (xi. yA for short. Denote 
the number of men in this class-rectangle by /(,. The joint 
distribution may then be tabulated as in Table 6.2. A general 
scheme is given in Table 6.2.2. 

Note : 

(i) *, is the mid-value of the class-interval of the ith *-array ; y t 
is the mid-value of the /th _v-array. 

(ii) If the data is not grouped — to each value of x corresponds 
but one value of y and to each y corresponds but one value 
of x. the correlation Table l>ccomes : 
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y.y 



The (*,, j> 7 ) group in Tabic 6.2.2, for instance, corresponds to 
that of those men whose weights are in the 130-139-lb. weight 
class and whose heights are in the t>5-in. height class, and /„ = 
3,879. Such a table is called a correlation table. Each row 
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and each column tabulates the distribution of one of the 
variatcs for a given value of the other. Thus each row (or 
column) gives a univariate frequency distribution. A row or 
column is often called an array : the .r, array, for example, is 
that row of y-values for which x = x t . 

Table 6.2.2. Correlation Table for Grouped Data 
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6.3. Scatter Diagrams, Stereograms. How do we display 
such a distribution ? If we confine ourselves to two dimensions, 
we make a scatler-diagram. A pair of rectangular axes is 
taken, the abscissa? being values of one of the variatcs, the 
ordinates those of the other. So to every pair of values 
[xi, yj) there will correspond a point in the plane of the axes. 
If we plot these points, we have a scatter-diagram. The 
main disadvantage of this method of display is that it is not 
well suited to the representation of grouped data, for it is 
difficult to exhibit a number of coincident points ! Neverthe- 
less, a scatter-diagram is very often suggestive of directions 
along which further investigation may prove fruitful. (Figs. 
6.3.1 (a), (b), (c) and {d).) 

To represent a grouped bivariate distribution in three 
dimensions, mark off on mutually perpendicular axes in a 
horizontal plane the class-intervals of the two variatcs. We 
thus obtain a network of class-rectangles. On each of these 
rectangles we erect a right prism of volume proportional to 
the occurrence-frequency of the value-pair represented by the 
rectangle in question. In this way we obtain a surface com- 
posed of horizontal rectangular planes. This is a prismogram 
or stereogram, corresponding in three dimensions to the 
histogram in two. Alternatively, at the centre of each class 
rectangle, we may erect a line perpendicular to the horizontal 
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plane proportional in length to the frequency of the variates 
in that class-rectangle. If we then join up all the points so 
obtained by straight lines, we obtain the three-dimensional 
analogue of the two-dimensional frequency polygon. Now, if 
we regard our distribution as a sample from a continuous bi- 
variate population parent distribution, we can also think of a 
relative-frequency prismogram as a rough sample approxima- 
tion to that ideal, continuous surface — the correlalion surface — 




59 60 61 62 65 64 65 66 67 68 69 70 71 72 73 74 75 76 77 

HEIGHT IN INCHES 



Fig. 6.3.2.— Frequency Contours of Bivariate Distribution. 

which represents the continuous bivariate probability distribu- 
tion in the parent population. 

Three-dimensional figures, however, also have their dis- 
advantages, and we frequently find it convenient to return 
to two-dimensional diagrams representing sections through the 
three-dimensional surface. Thus if we cut the surface with a 
horizontal plane we obtain a contour of the surface correspond- 
ing to the particular frequency (or probability) represented by 
the height of the plane above the plane of the variate axes. 
Fig. 6.3.2 shows frequency contours of ten, a hundred and a 
D 



STATISTICS 



thousand men in the groups of Table 6.2.1. It also shows 
mean weights at each height and mean heights at each weight. 

If, however, we cut the surface by a plane corresponding to a 
given value of one of the variates, we obtain the frequency (or 
probability) curve of the other variate for that given value of 
the first. 

6.4. Moments of a Bivariate Distribution. W e confine our- 
selves here to the discussion of bivariate distributions with 
both variates discrete. A brief treatment of continuous bi- 
variate distributions is given in the Appendix. 

We define the moment of order r in x and s in y about x = 0, 
y = 0 for the distribution of Table 6.2.2 as follows : 

AW =/, ,*,'{/,' -h/i^i'i/t" + . . . 

• • • +faxfyf + . . . 

• • • + fnXp'yS 

OI m«' =4 22 fipefUf . . . (6.4.1) 

" i < 

where / - 1, 2, 3, . . ./>; j= 1, 2, 3, . . .q and N = 22fij, the 
total frequency of the distribution. 1 > 

In particular, we have 

A'»» 10 ' = 22 fijXj, and, summing for j, 
i i 

Nm 10 ' = 2 {f a + /« + . . . + fax,. 

W riting $fif = /,., the total frequency of the value *<, wt, 0 ' — 

j^. "Lfajci, the mean value of * in the sample. Denoting this 
mean by x. we have 

»'io' = •* (6.4.2) 

and, likewise, m ol ' = y" (6.4.3) 

Again, m „ 0 ' = — 2 2/0* ( 1 =^. 2 ftjef, the second 

moment of x about the origin. Writing A', = xi — S, 
Vi = }'j - V. 

1 „ , , . ... 1 



- jj 2/,. (X, + x)* = N 2/,, (AV + 2^AT, + *«) 



1 « 



= ^ 2/.A7 + x*. since 1 2 f,..X, = 0. 
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Denoting the variance of x by s x * and var (y) by s,/, we have 

m J0 ' = Sr 1 + = m m -f (»»,„')• . (6.4.4) 

and. similarly. m„' = s/ + y* = m,,, + («„T . (6.4.5) 

where, of course, m. la and »i os are the moments of order 2, 0 
and 0. 2 about the mean. 
Now consider 

,.»„' = i 2 = i 2 Zfy {Xi + HfVj + y) 

= ^22 f^yj + ;? fyY, + yfijX, + 

« / j 

The quantity ^ 2 2 fi>X<Y) >s called the covariance of x and 

" ' J 

y and is variously denoted by s,„ or cov [x, y). We may there- 
fore write 

win' = mn + win/ • wi 0 i' 
or cov {x, y) ■ s.r„ = m M ' — mi 0 ' • wi 0 i' = win' — *J/ (6 4. ti) 

6.5. Regression. When we examine the data provided by a 
sample from a bivariate population, one of the things we wish 
to ascertain is whether there is any evidence of association 
between the variates of the sample, and whether, if such an 
association is apparent, it warrants the inference that a 
corresponding association exists between the variates in the 
population. We also wish to know what type of association, 
if any, exists. 

Frequently, if our sample is of sufficient size, a scatter- 
diagram of the sample data provides a clue. If. for instance, 
there is a fairly well-defined locus of maximum " dot-density " 
in the diagram, and if, when we increase the sample size, this 
locus " condenses ", as it were, more and more to a curve, we 
may reasonably suspect this curve to be the smudged reflection 
of a functional relationship between the variates in the popula- 
tion, the smudging resulting from the hazards of random 
sampling. In Fig. 6.3.1 (a) and (i>) we have scatter diagrams 
of samples from populations in which the variates are linearly 
related. If, however, the dots do not appear to cluster around 
or condense towards some fairly definitely indicated curve, and 
yet arc not distributed at random all over the range of the 
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Fig. 6.3.1 (a).— Scatter Diagram (I). 
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sample, occupying rather a fairly well-limited region, as for 
example in 6.3.1 (c), it is clear that, while we cannot assume 
a functional relationship, nevertheless perhaps as a result of 
the operation of unknown or unspecified factors, the variates 
do tend to vary together in a rough sort of way ; we then say 
that the chances are that the variates are stochastically or 
statistically related. It may be, however, that the scatter- 
diagram is such that the dots are pretty uniformly distributed 
over the whole of the sample range and exhibit no tendency to 
cluster around a curve or to occupy a limited region ; in this 
case, we may suspect that there is no association between the 
variates. which, if this were so, would be called statistically 
iiult- pendent (Fig. 6.3.1 (</)). 

We cannot rest content with such a purely qualitative test 
and must devise a more sensitive, analytical technique. Now 
it is reasonable to assume that if there is some tendency for x 
and y to vary together either functionally or stochastically, it 
will be more evident if we plot the mean value of each y-array 
against the corresponding value of x, and the mean value of 
each ;r-array against the corresponding value of y. In practice, 
it is customary to denote the means of ^-arrays by small 
circles and those of y-arrays by small crosses. 

Let the mean of the y-array corresponding to the value x = xi 
be y/, and the mean of the j-array corresponding to y = yj be .f,. 
If we plot the set of points (*,, yi) and the set (*/, yj), we shall 
find that in general each set will suggest a curve along which 
or near which the component points of that set lie. 

Increasing the sample size will generally tend more clearly 
to define these curves. We call these curves regression curves 
and their equations regression equations: that curve suggested 
by the set of points (x,, y,) is the regression curve of y on x; 
that suggested by the set of points [if, yj) is the regression 
curve of x on y. The former gives us some idea how y varies 
with x, the latter some idea of how x changes with y. And it 
is intuitively fairly obvious that if there is a direct functional 
relationship between the variates in the population sampled these 
two regression curves will tend to coincide. 

6.6. Linear Regression and the Correlation Coefficient. 
If the regression curves are straight lines, we say that the 
regression is linear; if not, then the regression is curvilinear. 
To begin with we confine ourselves to considering the case 
where regression is linear. 

Clearly, although the set of points (xi, yi) tend to lie on a 
straight line, they do not do so exactly and our problem is to 
find that line about which they cluster most closely. Assume 
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the line to be y, = Ax, -f- B, where A and B are constants to 
be determined accordingly. To do this we use the Method of 
Least Squares. 

The value of yi corresponding to should be Ax; ~ B. 
The difference between the actual value of y,- and this estimated 
value is yt — Axt — B. Now to each x; there corresponds fi. 
values of y. To allow for this fact we form the sum 

S s « = S/,. (ft - Ax, - B)* . . (6.6.1) 

1 . 1 

Now since all the terms making up S," arc positive, S„ l = 0 if 
and only if all the means of the y-arrays lie on this line. Thus 
Sij 1 would appear to be a satisfactory measure of the overall 
discrepancy between the set of points {xt, yi) and this theoreti- 
cal straight line. The " best " line we can draw, using this 
criterion— there are others — so that these points cluster most 
closely about it will, then, be that line for which Sj,* is ii 
minimum. 

Now Sf is a function of the two quantities A and B. To 
find the values of A and B which minimise S5* we equate to 
zero the two partial derivatives of 5 V * (see Abbott, Teach 
Yourself Calculus. Chapter XVIII. for an introduction to 
partial differentiation), with respect to A and B. Thus we 
have 

^ (S?) = - 2S//. (y, - Ax, - B)x, = 0 (6.6.2) 

and ^ (S;«) = - 2£/„ (y, - Ax, - B) = 0 (6.0.3) 

The latter equation gives us — remembering that fi. = £ fa — 

/ 

2 -Zfiffj - A S S fi,x, - B S £ /„ = 0. (6.6.4) 
1 i 1 J 11 

Dividing through by N = 2 £ fa, we have 

y = AS + B . . . . 16.6.5) 

Showing that the mean of the sample (x, y) lies on the line 
y> = Axi -r B 

Likewise (6.6.2) may be written 

S ZfijXiyj fat - BUS fyn = 0 (6.6.6) 

i J i i II 

and, again dividing by N. this is 

m„' - Am M ' - Bi = 0 . . . (6.6.7) 
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Solving for A between (6.6.5) and (6.6.7), we have 

A . "h^zJy . «U . ISL . . ( 6 .6.8) 
m. 2l) — *« »»,„ s,» 

Finally, subtracting (6.6.5) from ff = Ax,- ~ B. we have 

as the line about which the set of means of the y-arrays 
cluster most closely, the line of regression of y on x. Now, 
neglecting suffixes and bars, (6.6.9) may be written 

(y - y)ls t = (s„/sxs,)(* - x)ls z 

If, therefore, we transfer our origin to the mean of the 
sample distribution and measure the deviation of each variatc 
from the mean in terms of the standard deviation of that 
variatc as unit, i.e., if we put 

y -(?- y)ls, and X = (x - X)/s z 

the equation of our regression line of y on x is 

ir-taMf*. . . . (6.6.10) 

and this may be further simplified by putting 

r=SxylSrS, .... (6.6.11) 

Thus 

Y = rX (6.6.12) 

and in this form the regression line of Y on A" gives us a 
measure, r, of the change in Y for unit change in X. 

The line of regression of * on y is immediately obtained by 
interchanging x and y. thus 

-xu* - i h h i v w - fit* ■ ( 6 - 6 - 9a > 

or X = rY, i.e., Y = (Hr)X . (6.6.12a) 

We now have our two regression lines, one with gradient r, 
the other with gradient l/r, passing through the mean of the 
sample distribution (see Fig. 6.6). The angle between them, 
0, is obtained by using a well-known formula of trigonometry 
(see Abbott, Teach Yourself Trigonometry, p. 101), viz.. 

In this case, 

tan 0 = (1/r - *•)/(! + (1/r) . r) = (1 - r«)/2r (6.6.13) 
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We see immediately that if r* = 1, i.e., r = ± 1, 0 = 0 and 
the two lines coincide. This means that in the sample the two 
variates x and y are functionally related by a linear relationship 
and in the population which has been sampled it may also be 
the case. We say, then, that the two variates are perfectly 
correlated, positively if r = + 1 and negatively if r = — 1. 
On the other hand, if r = 0, 6 = 90°, and there is no functional 
relationship between the variates in the sample and hence, 
probably, little or none in the parent population : the variates 
are uncorrelated. It is natural, therefore, that, when regression 
i.< linear or assumed to be linear, we should regard r as a measure 




of the degree to which the variates are related in the sample by 
a linear functional relationship. We accordingly call r the 
sample coefficient of product-moment correlation of x and y or, 
briefly, the sample correlation-coefficient. 

The gradient of the line given by (6.6.9), the regression line 
of y on x, is (s c> ls z % ) or, as it is often written, cov (x, ,y)/var (*). 
This quantity, called the sample coefficient of regression of y 
on x, is denoted by b yl ; similiarly, s^/s»* or cov (x, y)/var (y) 
is the coefficient of x on y and is denoted by b ly . It follows that 

= V = '* • • («■*•!*) 

It should be noted that regression is a relation of dependence 
and is not symmetrical, while correlation is one of interdepend- 
ence and is symmetrical. 
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6.7. Standard Error of Estimate. We must now examine 
r in a little more detail so as to substantiate our statement thai 
it is a measure of the degree to which the association between x 
and v in the sample does tend toward a linear functional 
relationship. 

Taking the mean of the distribution as our origin of co- 
ordinates, wc may write the equation of the line of regression 
of y on * is the form y m b ¥I x. We now form the sum of the 
squared deviations not of the array-means from the corre- 
sponding values predicted from this equation, but of all the 
points (xi, Vj) from the points on the line y — b )z x corresponding 
to x = xi, i.e., from {xi, b >r Xi). Remembering that the fre- 
quency of (xt, yj) is ftj, the total sum of square deviations will be 

S £/<j [y, - b„x,)* = NS,\ say. . (6.7.1) 

Then 

WV = s Zfyy? - 2V £ Zjisw + b„* 2 s/W 
'J ' J I j 

= A-V - 2b fX Xs xt + iVV'si* 

Since b t , = s, t ls r % , we have 

XS* = As,* (1 - V/*xV) = A's/(1 - r*) 

or S, 1 = i„«(l -*•■).... (6.7.2) 

or r» = 1 - .... (6.7.3) 

i' y is called tlie Standard Error of Estimate of y from (6.6.0). 
Likewise, S«, where S,* = s x '(l — r»). is called' the Standard 
Error of Estimate of x. 

Since S t * is the mean of a sum of squares, it can never be 
negative. Therefore (6.7.2) shows that r* cannot be greater 
than I. When r = ± I, S y * — 0 and so every deviation, 
{y, - bfxXi) = 0 ; this means that every point representing an 
observed value, every (*;, yj), lies on the regression line of y 
upon x. But if r = ± 1. the line of regression of y on x and 
that of x on y coincide. Consequently, all the points represent- 
ing the different observed values (*,-. yf), and not just the 
points representing the array-means, lie on a single line. There 
is then a straight-line relationship between the variates in the 
sample and the correlation in the sample is perfect. We see 
then that the nearer r approaches unity, the more closely the 
observed values cluster about the regression lines and the 
closer these lines lie to each other, r is therefore a measure of 
the extent to which any relationship there may be between the 
variates lends towards linearity. 
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Exercise : Show that the line of regression of y on x as defined above 
is also the line of minimum mean square deviation for the set of 
points (x,, y t ). 

6.8. Worked Example. 

The marks, x and y. gained by 1 .000 students for theory and 
laboratory work respectively, arc grouped with common class 
intervals of 5 marks for each variable, the frequencies for the 
various classes being shown in the correlation-table below. The 
values of x and v indicated arc the mid-values of the classes. 
Show that lite coefficient of correlation is 0-08 and the regression 
equation of y on x is y = 29-7 + 0-U56* (Weatherburn.) 
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244 


189 


137 


55 


21 
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1,000 



Treatment : (I) To simplify working we select a working mean and 
new units. We take our working mean at (57, 07) and since the 
class-interval for both variates is 5 marks, we take new variates : 

X = (x - 57) /5; Y = {y - 01)15 

What will be the effect of these changes on our calculations ? 
Consider a general transformation, 

x = a + cX; y*=b + dY; then X = a + cX; 9 = b + dY 
We have : 



.</ = jjXfHfr - *)* = g ZMX, - X)* 
s, 1 = fsx* and, likewise, s,' = d'sy* 

S „ = is ZM* - *)<» - s) - - *Wi 



or 
Also 



cds x . 



SiS, csrfsr 



sx$r 



(0.8.1) 



Y) = cds xr 
(0.8.2) 

. (0.8.3) 



STATISTICS 



Again 

b „ = = - (dlc)b yx 

and, similarly. 

6„ = {filHb„ (6-8-4) 

We conclude, therefore, that such a transformation of variates 
does not affect our method of calculating r, while, if, as in the present 
case, the new units arc equal (here c — d = 5). the regression 
coefficients may also be calculated directly from the new correlation 
table. 

(2) We now set out the table on page 100. 

(3) (i) X = 0-230; Y = 0177. Consequently, S = 07 + 5 x 0-230 
- 58-195 . and y = 07 + 5 X 0 177 - 67-885. 

(u) jfifAX* - 2-541. /. s x ' = - * 

= 2-541 - (0-239)' = 2-484, and s x = 1-576. Consequently, although 
we do not require it here, s t = 5 x 1-576 = 7-880. 

(iii) p2£/ 0 y/ = 2-339. .*, s r > m 2-33S - (0177)« = 2J10S 

and s r = 1-519 . Consequently. s f = 5 X 1-519 = 7-595. 

(■ v ) jfZS/^XfY, = 1-671. :. s jr = ^SS/^y, - x? 

= 1-071 - (0-230) (0 1 77) = 1 020. giving *_ (not here required) 
= 5» x 1 029 = 40-725. ** H 

(v) The correlation coefficient, r. is given by 

r = s xr js x . s T = 1-629/1-570 x 1-519 = 0-670 . 

(vi) The line of regression of y on x is given by 

y - 9 = -*). 
But b rx = s rx /s x * = 1-620/2-484 = 0-656. 

Hence the required equation is 

y = 0656* + 20-696. 

Exercise: For the data of the above example, find the regression 
equation of x on y and the angle between the two regression lines. 
Use 0.6.13 and the value of Ian 8 to find r. 

Clieck and Alternative Method of Finding the Product-moment : 
It will be noticed that along each diagonal line running from top 
right to bottom left of the correlation table, the value of X + Y 
is constant. Thus along the line from A 1 = 4, Y = — 3 to Jf= — 3. 
Y — 4, X + Y •ml. Likewise, along each diagonal line from top- 
left to bottom-right, the value of X — Y is constant. 1'or the line 
running from X = - 3. Y = - 3 to X = 4, Y = 4, for example 
X - Y = 0. * ' 
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Now 

££/«(*, + Yjf 1 = + fffyW + i2£/«$y» 

and 

2 2 /<,(*, - y,)« = SS/,^,' + ES/«F/ - 2 2p£/ 0 *,V, 

If tlien, as in the present case, the entries in the table cluster around 
the leading diagonal, we may use the second of these identities to 
find the product-moment of X and Y. Tabulating, we have 



X,-Y, . 


-3 


o 


-1 


0 


1 


2 


3 


k ■ ■ 


17 


85 


200 


340 


217 


103 


23 


(A-, - y,)« . 


9 


4 


1 


0 


1 


4 


0 


MX, - v,y 


153 


340 


20(1 


0 


217 


412 


307 



ZZftXf = 2.541. Si:/ y J7= 2.339. 

From the table. £2/ tf t3T< - Y,)* = 1,538. Therefore. ZZf„X,Y, 

= i(2.541 - 2.339 - 1.538) = 1.071. 

If the entries cluster about the other diagonal, the first of the 
above two identities is the more convenient with which to work. 



6.9. Rank Correlation. Suppose we have n individuals 
which, in virtue of some selected characteristic A, may be 
arranged in order, so that to each individual a different ordinal 
number is assigned. The n individuals are then said to be 
ranked according to the characteristic A, and the ordinal 
number assigned to an individual is its rank. For example, 
"seeded " entries for the Wimbledon lawn-tennis champion- 
ships are ranked : they are " seeded "1,2,8 (first, second and 
third) and so on. 

The concept of rank is useful in the following ways : 

(a) We may reduce the arithmetic involved in investigating 
the correlation between two variates if. for each variate 
separately, we first rank the given values and then 
calculate the product-moment correlation coefficient from 
these rank-values. In this way we have an approxima- 
tion to the correlation coefficient, r. 

(6) We may wish to estimate how good a judge of some 
characteristic of a number of objects a man is by asking 
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him to rank them and then comparing his ranking with 
some known objective standard, 
(c) We may wish to investigate the degree of agreement 
between two judges of the relative merits of a number of 
objects each possessing some characteristic for which 
there is no known objective standard of measurement 
(e.g., two judges at a Beauty Competition ".) 

Assume a set of » individuals, a,, a,, . . . a„, each possessing 
two characteristics x and y, such that the n individuals may be 
ranked according to x and, separately, according to y. Let 
the rankings be as follows : 



Individual 


<*• 


*i • 


■ a, . 


■ 


*-Rank 










y-Kank 


y\ 


y* ■ 


■ n ■ 





Here x t , x t . . . xi . . . x„ are the numbers 1, 2. 3 . . . n in 
some order without repetitions or gaps. Likewise the y's. 

Now if there were perfect correlation between the rankings 
we should have, for all i, xt = y,-. If this is not so, write 

*t — yi = d t . 
Then 

2rf,« = 2 (x, - yip = £«■ + Sy,' - 2 S*,y, 

ll i i i 

But S *t — 2yf — 1« + S* <■+■ 8* -f- . . . -f »»', the sum of 

i t 

the squares of the first n natural numbers. Consequently, 

= 2y,« = n(n + 1)(2« 4- l)/«. Therefore, 
i I 

2*0* 2yi') - i^.' 

-«(»+ I)(fti + l)/6 -*£*». 

i 

Now 

cov (x, jrt = - 2 .vm — Sy ; 
» < 

and var* = vary = - (l 1 + 2* + . . . + »*) — y*; 
But x = y = (1 + 2 + 3 + . . . + n)l» = (« + l)/2. 
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Therefore, 

cov (*, y) = (n + l)(2n + l)/6 - (Srf,«)/2n - (n + 1)',4 

= («»- 1)/12 - (S*«)/2« 
and < 

var x = (n + l)(2n + l)/6 - (n + l)*/4 = (w« - 1)/12 

So cov (x, y)l(var x . var y) } = cov (*, y)/var * 

= 1 - 6(S di'j/fn 3 - n) . . . (6.9.1) 

t 

This is Spearman's coefficient of rank correlation, R. If 
Srf,* = 0, it = 1, and there is perfect correlation by rank 

i 

(since rf, = 0 for all i, and, so, for all t, Xi = vi). 

What happens, however, if the two rankings are exactly the 
reverse of one another ? This is the most unfavourable case 
and, consequently, Srf, 2 is a maximum, while, since for all i, 

Xi + yi will be equal to n + 1, 2 (*, + }>()* = n(n + 1)*. We 

have, then, 2Zx,Vi = w(n + 1)« — «(w + l)(2n + l)/6 or 

S*tV( = »(» + l)(n + 2)/6. Cov (*, y) is - (w» - 1)/12 

and var (x) is + (n* - 1)/12. .-. R = - 1. Thus varies 
between the two limits ~ 1. 

Worked Example 1 : The figures in the following table give the number 
of criminal convictions (in thousands) and the numbers unemplowd 
(in millions) for the years 1924-33. Find the coefficient of rank- 
correlation. 



Year ] 1034 


IMS 


1020 


1027 


1928 


1920 


1930 


1031 


1933 


1933 


Number convicted 
of crime , | 7-88 


8-13 


7-86 


7-36 


7-44 


7-22 


8-28 


8-83 


10-M 


9-46 


Number of unem- 
ployed , 


1-36 


134 


1-43 


1-10 


1-31 


1-34 


3-5 


287 


3-78 


2-26 


Treatment : 


We rank the data, thus : 


Year 


l'j'.'l 


192S 


19SG 


1927 


1028 


1020 


1930 


1931 


1932 


1933 


Number convicted 


« 


6 


7 


0 


8 


10 


4 


3 


I 


a 


Number uncm> 
ployed . 


8 


9 


1 


10 


7 


6 


1 


1 


1 


4 


4f . . . 


4 


16 


4 


i 


1 


16 


1 


1 


0 


4 
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We have £ d, 2 = 48, and, since n = 10, •*—*«■ 9!)0. Conse- 
i 

quently, 

ff = l-6x^, = 0-709 

Exercise : Find r, the product-moment correlation coefficient, for the 
above data. 



Worked Example 2 : Two judges in a baby-competition rank the 12 
entries as follows : 



X 


1 2 3 4 5 6 7 8 9 10 11 12 | 


Y 


12 9 6 10 3 9 4 7 8 2 11 1 j 



What degree of agreement is there between the judges ? 



Treatment : Here we have no objective information about the 
babies, but Uie coefficient of rank correlation will tell us something 
about the judges. We have 

= 416, n»-« = 1,716 

Thus. R = — 0-455, indicating that the judges have fairly strongly 
divergent likes and dislikes where babies are concerned I 

6.10. Kendall's Coefficient. A second coefficient of rank 
correlation has been suggested by M. G. Kendall. Consider 
the rankings in Example 2 above. I f we take the first number 
of the second ranking, 12, with each stuceeding number, we 
shall have 11 number-pairs. To each pair, (a, b), say, allot 
the score 1 if a < b, and the score — 1 if a > b. Thus for 
the 11 pairs (a, b), where 0 = 12, the total score is — 11. 
Now consider the 10 pairs (a, b), where a — 9, viz., (9, 6), 
(9. 10), (9. 3), etc. The total score for this set is - 1 + 1 - 1 
— 1 — 1 — 1 — 1 — 1 + 1 — 1 = — 6. Continuing in this 
way, we obtain the 11 following scores, — 11, — 6, — 1, — 6, 
3, 0, I, 0, — 1. 0, — 1, totalling — 22. Had the numbers been 
in their natural order, as in the upper ranking, the total score 
would have been 11 + 10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 
2 + 1 = 66. 

The Kent/all rank correlation coefficient, t, is the ratio of 
the actual to the maximum score, i.e., in this case. 



The Spearman coefficient, R, for the same data was 



- 0-451. 
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Generally, if there arc n individuals in the ranking, the Kendall 
coefficient is 

t = 2S/n(« -])... (6.10.1) 

where S ts the actual score calculated according to the method used 
above. 

A shorter method of calculating S is : 

In the second ranking, the figure 1 lias 0 numbers to its 
right and 11 to its left. Allot the score 0 — 11 = — 11 
and cross out the 1. 2 has 1 number to its right and 9 
numbers to its left : the score, therefore, allotted is 
1 — 9 = 8; cross out 2. 3 has 5 numbers to its right and 

4 to its left ; score is 5 — 4 = 1 ; cross out 3. Continue 
in this way, obtaining the set of scores : — 11, — 8, 1, 
— 2. - 1, 2, — 1. — 2, 1, 0, — 1, the total of which is 

5 = - 22. 

Alternatively, we may set down the two rankings, one of 
which is in natural order, one above the other, and join 1 to 1 . 
2 to 2, 3 to 3 and so on. Then if we count the number of inter- 
sections (care must be taken not to allow any two such inter- 
sections to coincide), H, say, S will be given by 

S =m(m - l)/2 - 2N 

and. therefore, 

t = 2S/n(»» - 1) = 1 - 4.V/h(» - 1) • (6.10.2) 

Like Spearman's It, Kendall's x is + 1 when the corre- 
spondence between the rankings is perfect, and — 1 only if 
one is the inverse of the other. When »i is large t is about 
2/?/3. 

Worked Example : Show that the values of T between the natural 
order 1, 2, ... 10 and the following rankings are — 0-24 and 
0-60 : 

7, 10, 4. I, 6. 8, 9, 5, 2, 3 
10. 1, 2. 3. 4. 5. 6. 7. 8. 9 

Find also t between the two rankings as Ihev stand. (Modified 
from M. G. Kendall. Advanced Theory of Statistics, Vol. I. 
p. 437.) 

Treatment : (1) Consider the first ranking. L'sing the short 
method of calculating 5. we have 

S = (6 - 3) + (1 - 7) + (0 - 7) + (4 - 2) + (0 - 5) 

+ (2 - 2) + (3 - 0) + (1 - 1) + (0 - 1) - - 11 
Hence t = —11/45 = — 0-24. 
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(2) Using the alternative short method for the second ranking we 
have : 

1. 2, 3. 4, 6, 0. 7. 8. 9. 10 
10, 1. 2. 3. 4. 5. 6. 7. 8. 9 

and. obviously. N — 9, the number of inversions of the natural 
order. Thus 

t = 1 - 4 x 9/90 = 3/5 - 0 01) 

(3) To find - between the two rankings, rearrange both so that one 
is in the natural order. Here it is easier to put the second in that 
order : 

10. 4, 1.6, 8, 9, 5, 2, 3, 7 
1, 2. 3. 4. 5. 6. 7. 8, 9. 10 

Then S = - 5 and t = — A = 

Exercise : Show that II between the natural order 1. 2. ... 10 and 
the above two rankings has the values —0-37 and 0-45 respectively 
and that between the two rankings as they stand It = — 0-19. 

6.11. Coefficient of Concordance. Frequently we need to 
investigate the degree of concordance between more than two 
rankings. Suppose, for example, we have the following 3 
rankings : 

A' I 2 3 4 5 0 7 8 9 10 
Y 7 10 4 1 0 8 9 5 2 3 
Z 9 IS 10 354782 1 

Summing the columns, we have the sums 

17 18 17 8 16 18 23 21 13 It 

Had there been perfect concordance, we should have had 

3 6 9 12 15 18 21 24 27 3(1 

and the variance of these numbers would then have been a 
maximum. But when, as in the present case, there is little 
concordance, the variance is small. It is reasonable, therefore, 
to take the ratio of the variance of the actual sums to the variance 
in lite case of perfect concordance as a measure of rank-con- 
cordance. 

The mean of each ranking is (n + l)/2 in the general case; 
therefore, if there are m rankings, the mean of the sums will 
be m(n + l)/2. With perfect concordance, these sums will 
be m, 2m, 3m, . . . nm and their variance, then, 

m s (l« + 2« + 3* + . . . + n'l/w - »»*(« -f l)*/4 

= »»'(«* - 1)/12 



il6 
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Let S be the sum of the squared deviations of the actual sums 
from their mean. m(n + l)/2. We define the coefficient of 
concordance, W. between the rankings, by 

W '= (S/»)/m J (n» - 1)/12 = 12S/m*»(n* - 1) . (6.11.1) 

Clearly, W varies between 0 and 1 . 

In the case of the three rankings given, m = 3, n = 10 and 

W m 12 X 182-5/9 X 990 ■ 0-246 

It may be shown (see Kendall, Advanced Theory of Statistics, 
vol. 1, p. 411) that if R a y. denote the average of Spearman's J{ 
between all possible pairs of rankings, 

/?„v. = (mW - l)/(m - 1) . . (6.11.2) 

Exercise : Verify that (6.11.2) holds in the case of the three rankings 
given at the beginning of this section. 

6.12. Polynomial Regression. So far we have limited our 
discussion of regression to bivariate distributions where the 
regression curves were straight lines. Such distributions are, 
however, the exception rather than the rule, although they are 
important exceptions. If, using the notation of 6.6, we plot 
y,- against xt (or ij against _yj), the line about which these points 
tend to cluster most closely is usually curved rather than 
straight. When this is the case, the coefficient of correlation, 
r, which, it will be recalled, is a measure of the extent to which 
any relationship between the variates tends towards linearity, 
is no longer a suitable measure of correlation. 

The simplest type of non-linear equation is that in which one 
of the variates is a polynomial function of the other, viz., 

y = a 0 + a l x + a^* + . . . + a,x r + . . . + a t x l 

= EiV . (6.12.1) 
r— 0 

where the coefficients a,, (r = 0, 1, 2, ... k). are not all zero. 
If the regression equation of y on x (or x on y) is of this form, 
we have polynomial regression. 

Once we have decided upon the degree, ft, of the 
polynomial, we again use the Method of Least Squares to 
determine the coefficients, a,, (r = 0, 1, 2, . . . k). 

Referring to Tabic 6.2.2, let y, be the actual mean of the 
,r r array. and let y,bc the calculated, or predicted, value when 
x = x, is substituted in (6.12.1). (If the data are not grouped 
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and only one value of y corresponds to a given x, that value of 
y is, of course, itself the mean.) 

The sum of the squared residuals, S 1 , is then given by 

S« = S/.(y, - y.)« = S/,(y. - S «,*.■)» • (6.12.2) 
< i t 

S 1 is thus a function of the k + 1 quantities, a,,(r = 0, 1 , . . .k). 
To find the values of these quantities which minimise S*, we 
differentiate S J partially with respect to each of these quantities 
and equate each partial derivative, dS 1 /da r , to zero. This 
gives us K + 1 simultaneous equations in the a's, the normal 
equations, dS t IBa, = 0 (r = 0, 1, ... A), from which the 
required coefficients may be determined. 
The following example illustrates the method when k = 2. 

Worked Example : The profits, {y, of a certain company in the xth 
year of its life are given by : 



X 


i 


2 


3 


4 


5 


y 


1.250 


1.400 


1.650 1.950 


2.300 



Find the parabolic regression of y on x. 

(Wcathcrbitrn.) 



Treatment : Put u = x — 3; v = (y — 1,650) /50. Then— 



X. 


u. 


««. 


u'. 




y. 


V. 


vu. 


vu'. 


1 


-2 


4 


-8 


16 


1.250 


- 8 


10 


-32 


o 


-1 


1 


-1 


1 


1.400 


- 5 


6 


- 5 


3 


0 


0 


0 


0 


1.050 


-13 


0 


-37 


4 


1 


1 


1 


1 


1,050 


6 


6 


6 


S 


2 


4 


8 


16 


2.3IH) 


13 


26 


62 




0 


10 


0 


34 




6 


53 


» 



For parabolic regression of v on u, v = a + bti + cu* and, so. S' = 

£ {a + bu, + £«,» - y,)«. 

I 

.-. dS*lda = 2 S (a + bu, + cu, 1 - »,) 



= 2{na + b S u, + c S u, - Z v,) 
i i < 
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eS'lib = 2 S (a + bu, + cti,* - v,)u, 
i 

= 2(a S u, + b 2 u? 4- c 2 «,* - 2 MfJ 

8S«/a: = 2 E (a + fcu, + cm,* - v,)u,* 

ft- 

m 2(a S Hf + b S n,» + cS h< - 2 iy<,«) 

I f if l 

In the present example, the normal equations cS'lda. cS'liib, 
eS'lBc = 0, are 

5a - 10c - 6 = 0 ; 106 - 53 = 0 ; 10a + 34e - 21 = 0 

giving a = - 01180, 6 = 5-3, c = 0-643. 

The regression equation of v on u is. therefore, 

v = - 0 086 + 5-3" + 0-643h«. 

Changing back to our old variates, the required regression equation 

is 

y = 1.140 +'2x + 3215*» 

6.13. Least Squares and Moments. If we differentiate 
(6.12.2) partially with respect to a„ we have 

8S*/ea, = S /,()>, - Sa*,') . - 2x,' 

I r 

and, equating to zero, 

S/ttfft = S/...*/y„ for all r. 
< i 

showing that — 

The process ol fitting a polynomial curve of degree k to 
a set of data by the method of Least Squares is equivalent 
to equating the moments of order 0, 1, 2 ... k of the 
polynomial to those of the data. 

6.14. Correlation Ratios. The correlation table being (6.2.2), 
let the regression equation of y on * be y = y(x). Then 
Yi = y(xt). If. then, S y * is the standard error of estimate of 
y from this equation, i.e., the mean square deviation of they*s 
from the regression curve, 

,VS„« = 2 Zfaiyj - V ( )» = 2 Zfyiy, - yi + yt - Y,)* 
I J I J 

= S S/tfCyj - ft)' 4-2 2 2/<,(y, - - «,) 

+ 2 2/i,(y - Yl)*. 
< J 
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I-et mi = //. 1= 2 7y, the total frequency in the *,th array, 

J 

and s y , r, the variance of the y's in the same array. Then, 
since 2/(>(yj — yt) = 0. 

.VS„' = Stt/sV* + 2w ( (y ( - Vi)> . (6.14.1) 

* i 

It follows that if all the means of the ^-arrays lie on y = y{x). 
i.e., — y<) = 0, for all i, S„* is the mean value of the 
variance of y in each .r-array, taken over all such arrays. Con- 
sequently, if all the variances are equal, each is equal to S y *. 

When this is the case, the regression of y on x is said to 
be homoscedastic (equally scattered). 

If the regression is also linear, S y 2 = s ¥ * (1 — r*) and so the 
standard deviation of each array is s, (1 — r*)i. 

Now let Sj,' 1 be the mean square deviation of the y's from 
the mean of their respective arrays. Then 

NS,* = 2 2/ (J (y, - y,)= = 2 2/ y y,» - 2 2 tftffi, 
i j i J t i 

4-2 2/ (j yV 

i f 

= 2 2/,y/ - 2 2 [)-.(2/<,y,)] + 2 2/^V 
I I If i i 

But 2 ftffj — ft 2 /(j, and, therefore, 

1 J 

,VS," - 2 2/ /iW » - 2 Zfytf 
t i t i 

= .Vm,; - 2 ndp = N»V + A'y' - 2 wtf,* 

= .\V- (2wrtV - Nf) 

i 

Hut y is the mean of the array-means, )',•; therefore the 
expression in brackets is .V times the variance of the means of 
the arrays, which we shall denote by S?. So 

5/* = s„« — s„« .... (6.14.2) 

By analogy with (6.7.2), we write this 

V* = VP - e »*) ■ ■ ■ (»■ 14 3) 

where 

e fZ = Sf/Sy (6.14.4) 

and is called the correlation ratio of y on x. Likewise e If = 
SiiSr, is the correlation ratio of x on y. Since both S/* and 
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s y 3 are positive, (6.14.3) shows that e„, a ^ I. Moreover, since 
the mean-square deviation of a set of quantities from their 
mean is a minimum, 

0 V 1 < S/ or 0 < 1 - c„* < 1 - r*. 

i.e.. r* ^ trf < 1 

or 0 < e ¥X * — r* < 1 - r*. . . (6.14.5) 

Since we regard rasa measure of the degree to which any 
association between the variates tends towards linearity and 
since the residual dispersion, 1 — r 2 , is 0 when r* = 1, and 1 
when r = 0, a non-zero value of Off — r* may be regarded 
tentatively as a measure of the degree to which the regression 
departs from linearity. (But see 10.10.) 

That e y t is a correlation measure will be appreciated by 
noting that, when e y r* = 1, S/ 2 — 0, and, so, all the points 
(xi, j/j) lie on the curve of means, the regression curve, 
y = y(x), i.e., there is an exact functional relationship 
between the variates. 

6.15. Worked Example. 

Calculate the correlation ratio e, T for the data of 6.8. 
Treatment : 

(1) e,J> -V/V m$f*0i -Np)INs,'. (6.15.1) 

Let T,{y) be the sum of the y's in the x,th array. Then T,(y) — ntf, 
and 1 T,(y) = Nf. If T(y) be the sum of the y's in the distribution. 

T(y) = S T,{y). Then N? = [T{y)]*IN and Xnffi, = . 
Consequently 

From the definition, e^ 1 = tfltf, we see that, since deviation from 
the mean is unchanged by change of origin, « • is unchanged thereby, 
furthermore, since both numerator and denominator involve only 
squares, they are changed in the same ratio by any change of unit 
employed. Hence e t * is unchanged by change of unit and origin, 
and we may. therefore, work with the variables X and Y in our 
table of 6.8(6). From that table, then, we see that T(y) is the total 

o( row F, and s/tZ ffi U-j the sum of those quantities each of 

which is the square of a term in row /•' divided by the corresponding 
term in row E. 
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(2) We have then : T(y) = sum of row F = 177 and H = 1.000. 
giving T(y):»/-V = 31-329. Also 



v f iT l {y)) t \ _ 37« 113* , 16VJ , 11* , IW 
TV n, / ~ 26 T 97 ^ 226 T 244 T 189 



I SI- , 
137 + 



_ 112* 66' , 1JJ 

"•" 55 + 21 "-" 5 

= 1.117 (to 4 significant figures). 
**» V- A'xlSa (V V = 2 308 from 6-8 ( C) ) 
= 0-471 

or e = 0-686, approximately. 

Since e t * — r* = 0-OOU. the departure from linearity is small. 

6.16. Multivariate Regression. W hen we have more than 
two correlated variates, two major problems present them- 
selves : 

(1) we may wish to examine the influence on one of the 
variates of the others of the set -this is the problem of 
multivariate regression; or 

(2) we may be interested in assessing the interdependence of 
two of the variates, after eliminating the influence of all 
the others — this is the problem of partial correlation. 

Here we confine ourselves to the case of three variates, 
»l. *» * s . measured from their means, with variances s,*, s. 1 . s 3 * 
respectively. 

Since the variates are measured from their means, let the 
regression equations be 

*. = <W« + Wl • • • < 6181 > 
*. = + 6§W*i • • • (6.16.2) 

* 3 = *»3it*i + * >S i*s • ■ ■ (6.16.3) 

W e shall determine the 6's by the method of Least Squares. 
Consider (6.16.1). The sum of the squared deviations of 
observed values of -v, from the estimated values is given by 

S« = S (x | - bu^tt - t,,.,*,) 1 . (6.16.4) 

The normal equations are : 

b lt . 3 2 * s * + b l3i S x,x, = 2 x lXl . (6. 16.5) 

b ls . s + fc„. 2 S *,» = E .r,*, . (6.16.6) 
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Solving these, we have 

^tSR5^] : • • 

Here r t2 , r 23 , r 3l are /o/a/ correlations : r ls , for instance, is the 
correlation between .v, and x.. formed, by ignoring the values 
of x 3 , in the usual manner. 

The coefficients 6, a . 3 and 6,,., are called partial regression 
coefficients. Whereas ry = r^, fry* is not in general equal to 
bji.k. 

The reader familiar with determinant notation 1 will realise 
that we may simplify these expressions. Let R denote the 
determinant 

r» r, t r„ 1 where ^ = ^ = = , 

r tx ** * and r« = ru for i, / = 1, 2. 3, but i * j. 

r tl r »t '3» I 

Then, if Ry denotes the cofactor of ry in /?. we have 
»u — 1 - '■as 1 : - Su = - - /? la 



also 

c 21 /? M - + r„B M = 0 for 



— fu — '"i/'af 

*n + + ^JUa = « 
rjft« + /(„ + »-„/?„ = 0 
r l3 /?, + /e I3 =0 J 

(6.16.9) 



The regression equations become : 

(1) Regression of x, on * 8 and x, : 

5»l *, + $1 *, + - 0 . (6. 16. 10(a)) 

s, s, s, 

(2) Regression of x„ on * 3 and *, : 

% *, + ?■ *i + ^»*i-0 • (6.16.10(6)) 
s, s, s, 

(3) Regression of x 3 on *, and *, : 

^x 3 + SU*. -f 7 ^*, =0 . (6.16.10(c)) 
s » *i *i 

In the space of the three variates. these equations are 

represented by planes. These regression planes should not be 

1 See Mathematical Note at end of Chapter. 
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confused with the regression tines : the regression line of 
on x. being *, = (r^ijs^, for instance. 

6.17. Multiple Correlation. 

Definition : The coefficient of multiple correlation of 
x, with and .v 3 , denoted by r,. J3 , is the coefficient of 
product-moment correlation of x l and its estimate from 
the regression equation of .v, on x. and x 3 . 

That this is a natural definition is clear if wc recall that if x t 
lies everywhere on the regression plane, there is an exact 
functional relationship between the three variates. 

c. t u i a ^ (* !■ b t***s ± frie s' 

From the defimt.on. r,. ss = ^-—--- fJj -- 

(i) cov (*„ 6,4.^1 + b l3 . t x 3 ) 

= £ (*t»»**»J + ''is-!^i*s). since = 5, = * 3 - 0 

= &„.,£(*,*,) + &iw£fafi**) 

= covf*,*,) + 6,,.. cov (*,* 3 ) 

_ S|*M „ . . s i R i* , , , 

= — r S- 'isVs - r ft r l3*l S 3 
S 3"ll 



- t'.^.i + '.3K.3! 

"II 

Now, using the first equation of (6.16.9). 

cov (.v,, 6, ~ b ia4 *|) = — jf- W - ffli^lJ 

= s,»[l - RlR tl ). V c„ = 1. 

(ii) var (*,) = 

(iii) var (6,^3*. -f 6,j...r 3 ) 

= &J**V + 6li.iV + 26lf3''lS-!''i» i * S 3 77 41 

= i!i[« l! » + /? 1J « + 2/? 11 /?, s r as l 
"11 

= 5^. [fi u (fi u + + R l3 (R l3 + r tt R tt )) 

•mi 

Using the second and third equations of (6.16.9). 

var (6, 2 .,.v s + 6 13 .,.v 3 ) - ^u^u'u ~ ^u^u'iil 

= - it* [r l3 R lt + r l3 R l3 ) = Sl «[l - R/R lt ]. 
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Consequently, 

*»m -P - »/*iJ* = r r »' + r »*-* r *****'l i. (6.17.1) 
Likewise 

'mi = [1 - RIR.iV and r MI = [1 - RIR st ]i 

6.18. Worked Example. 

Calculate the multiple correlation coefficient of x t on x, and x, 
r i.n. f ,om If" following data : 





x t . 


*a- 


5 


2 


21 


3 


4 


21 


2 


2 


15 


4 


2 


17 


3 


3 


20 


1 


2 


13 


8 


4 


32 



Find also the regression equation of x, on x t and x,. 
Treatment : 



x l 


X, = 
x t -i 


*i« 


*t 


X, - 
*,-3 


AT,' 


*, 


Xt = 
*»-20 


*,« 


X,X, 


x& 


X,X, 


B 


+ 1 


1 


2 


-1 


1 


21 


+ 1 


1 


-1 


- 1 


+ 1 


:i 


-1 


1 


4 


+ 1 


1 


21 


+ 1 


1 


-1 


+ 1 


- 1 


2 


-2 


4 


2 


-1 


1 


18 


— 5 


25 


+ 2 


+ 5 


+ 10 


4 


0 


0 


2 


-1 


1 


17 


- 3 


9 


0 


+ 3 


0 


3 


-1 


1 


3 


0 


0 


20 


0 


0 


0 


0 


0 


1 


-3 


9 


2 


-1 


I 


13 


- 7 


40 


+3 


+ 7 


+ 21 


8 


+4 


16 


4 


+ 1 


1 


32 


+ 12 


144 


+4 


+ 12 


+48 


-2 


32 




-2 


6 




- 1 


229 


+ 7 


+ 27 


+ 79 

1 



= -* = - 0-286; X, = - 4 = - 0-286; X. — — i 
- - 0143. $XX* = 4-571 ; iSX,' = 0-857; 4 £ X.* = 32-714; 
| r-V,X, = } = 1 ; 4 £X,X, = *r = 3 857; * -ty- 11-286. 



c ov (*,»i) = j S tf.Jf, - A-.AT, = 1 - (- 0-286)(- 0-286) 

= 1 - 0 082 = 0 918 

cov = kZXJC, - »A = 3-857 - (- 0-286)(- 0 143) 

= 3-857 - 0 041 = 3-816 

cov (ay,) -iSJTjX, = 11-286 - (- 0-286) (- 0 1 43) 

= 11-286 - 0 041 = 11-245 
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var to) = 4 - - ,Y,« = 4-571 - (- 0-286)« 

= 4-671 - 0 082 = 4j489 
var to) = 4 S Xt* - Xi* = 0-857 - (— 0-286)» 

= 0-857 - 0 082 = 0-775 
var to) = A»» - Xi* = 32-714 - (- 0-143)» 

= 32-714 - 0 020 = 32-694 
3-816 



0-M8 _ _ 
.4-489 X 0-775 ! ' i: ' 10-775 x 32-C.94J 



= 0-758; >•,, = 



11-245 



R = 



1 

0-492 
0-927 



0-492 
I 

0-758 



0-927 
0-758 
1 



[32-694 X 4-489] 1 
= 00584 



= 0-927 



[1 - (0-758)'; = 0-4254; R„ = - [0-492 - 0-758 x 0-927] 
= - 0 05533 ; ft,, = [0-492 x 0-758 - 0-927] = - 0-554 1 

Regression equation of a-, on x t and x 2 is (6.16.10(a)): — 

5»<*i - *.) + *"to - « + - *,) = o. 

s, s, s, 

where ^ = X + i = 3-714, .f , = X t + 3 = 2-714, f, = AT, + 20 = 
19-857, and s, = (4-489)* = 2 11873. s, = (0-775)1 . 0-88034,s, = 
(32-694)* = 5-71787. 

Hence the required regression equation is 

0-20072(*, - 3-714) + 0-24024(.r, - 2-714) - 0-00708to - 19-857) 

- 0 

or 20-l.r, + 24-O.r, - 9-7r, -f- S30 = 0 



Exercise: Calculate r t .„ and r,.,, and find the other two regression 
equations. 

6.19. Partial Correlation. In many situations where we have 
three or more associated variates, it is useful to obtain some 
measure of the correlation between two of them when the 
influence of the others has been eliminated. 

Suppose we have the three variates, X v * s , with regression 
equations (6.16.1), (6.16.2) and (6.16.3). Let *, be held con- 
stant, then at this value of x„ the two partial regression lines 
of x l on x t and of x s on x l will have regression coefficients 6,,. 3 
and b !V3 . In line with (6.6.14), we therefore define the partial 
correlation of .r, and x s to be given by 

'iw* - (*im X *»•») (6.19.1) 
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Likewise 

W - ( b t*l X b »fl) 

and r tM » = (b t ,. t x b 13 ..). 

It follows that 

' i2-a - IT X — - — 



(1 - r„»)(l - r 3l ») 
i.e.. (•.*. = ~ . . . . (6.19.2) 

In practice it is seldom found that r, s . 3 is independent of x,, 
and we therefore regard the value of r lt . 3 given by (6. 19.2) as a 
rough average over the varying values of x 3 . 

I sing the data of 6.19, we have 

0 05033 - 0-226. 



— [0-4254 x 0-1407]* 



Mathematical Note to Chapter Six 

If a, b, c, d are any four numbers we denote the quantity 

ad -be by | * J |. Thus | 5 3 j = 1 x7-3x5=-8. 

Such a function of its four elements is called a determinant of 
order 2. having 2x2 elements. A determinant of order 
three has 3x3 elements and is written 



"31 "32 "33 . 

the suffixes of an element indicating the row and column it 
occupies in the determinant. 

Suppose we select any element, a 3l say, and rule out the 
row and column in which it occurs. We are left with the 4 

elements . The determinant of these four numbers 

«38 "33 : 

multiplied by (— l)»+ l = - 1 is called the cofaclor of a,, in 
the determinant. The cofactor of a„ is 



(_ 1)3+3 -II "U _ . | "II "12 

In general, we denote the cofactor of an element by A (l . 



MORE VARIATES THAN ONE 

The value, A, of the determinant 
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*Sl °S2 °33 

may be obtained by forming the sum of the products of the 
elements of any row (or column) and their respective cofactors. 
Thus 



A =a ll .4 ll + a lt A„ 



"13^13 " «3I^2I + 

= "SI^SI + "SI^SJ + «J3- -, 3» 



"Mil + a t\All + "31^31 — "ltd 12 + "tt^tt + rt 3! /, 32 

= <i„-4„ + a„A„ + a it A„ 



Tor instance 



1 -2 3 
7 4-1 
2-3 2 



= 1 



4 -1 

-3 2 



+ (- 1) X (- 2) | I I 



+ 3 



7 4 
2 -3 



= (8 - 3) + 2(14 + 2) -f- 3(- 21 - 8) = - 50. 

Suppose now the elements of any row (or column) are pro- 
portional to those of another row (or column). Say, for 
example, we have 

in" v«" = fl u*«i*«w - a n Xi, i« a M - «12tall<»33 
a a a + tt l***hl + ".3*"i.«,2 - l|M = 0 

"31 "32 "33 

In fact, if any two rows (or columns) of a determinant are 
identical or their elements are proportional, the value of the 
determinant is zero. 
Now let us write 



a xv A„ | «„.4, 



«l3-'23 



Then, clearly, a tl A tl + a lt A tt + "is<4 !3 = 0. 

In fact, if we form the sura of the products of the elements of 
any row (or column) and the cofactors of the corresponding 
elements of another row (or column), that sum is zero. 

(The reader will find a useful introduction to determinants 
in C. A. B. Smith's Biomalhemutics (Griffin) ; a more detailed 
discussion is to be found in Professor A. C. Aitken's Deter- 
minants and Matrices (Oliver & Boyd).) 
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EXERCISES ON CHAPTER SIX 



1. Calculate the means of the following values of * and y : 



X . 


0-25 


100 


2-25 


400 


6-25 


y ■ 


0-12 


000 


2- 13 


3-84 


607 



The corresponding values of x and y satisfy approximately the 
equation y = mx + c. By the method of least squares obtain the 
best values of the constants m, c, assuming that there is error in the 
y values only. 

2. Daily Newspapers (London and Provincial) 1030-40 : 



Year 


1930 


1931 


1932 


1933 


1934 


1935 


Number 

Average circulation 
(millions) . 


109 
170 


104 
17-6 


150 
17-9 


157 
18-2 


147 
180 


148 

18-2 


Year 


1930 


1937 


1938 


1939 


1940 




Number 

Average circulation 
(millions) . 


148 

185 


145 
191 


142 

19-2 


141 

19-5 


131 
18-9 



Fit a straight line to each of these series by the method of least 
squares, represent graphically and comment on the fit. (L.U.) 

3. Calculate the coefficient of correlation between the continuous 
variables * and y from the data of the following table : 





X. 




y- 
















Total. 




-4 to 


-3 to 


-2 to 


-1 to 


0 to 


1 to 


2 to 






-3. 


-2. 


-1. 


0. 


1. 


2. 


3. 




~3to -2 


150 


4(1 


20 


10 








220 


-2 to -1 


20 


60 


90 


20 


10 






200 


-1 toO 


10 


40 


60 


50 


20 






180 


0 to 1 




30 


36 


42 


20 


16 


6 


150 


1 to 2 


10 


16 


30 


20 


16 


6 


2 


too 


2 to 3 


24 


38 


48 


34 


6 






150 


Total 


214 


224 


284 


176 


72 


22 


8 


1,000 



(I.A.) 
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4. Calculate the correlation coefficient for the following U.S. 
data : 

Index of income payments 114 137 172 211 230 239 
Index of retail food prices 97 105 124 139 130 139 

(L.U.) 

5. An ordinary pack of 52 cards is dealt to four whist players. 
If one player has r hearts, what is the average number held by his 
partner? Deduce that the correlation coefficient between the 
number of hearts in the two hands is — J. (R.S.S.) 

6. In the table below, verify that the means of the ^-arrays are 
collinear, and also those of the y-arrays, and deduce that the 
correlation coefficient is —0-535. 



1 



0 

y\ 

3 



12 
12 



18 
54 
18 



4 

30 
30 
4 



(R.S.S.) 

7. The ranks of the same 15 students in Mathematics and Latin 
were as follows, the two numbers within brackets denoting the ranks 
of the same student : (1. 10). (2, 7), (3, 2), (4. 6). (5. 4), (6. 8), (7. 3), 
(8. I). (9. 11), (10. 16). (11, 9), (12, .-.). (18, 14), (14, 12). (15. 13). 
Show that the rank correlation coefficient is 0*51. 

(Weathorburn.) 

8. From the table below compute the correlation ratio of y on x 
and the correlation coefficient : 

Values of x 





0-5-1 -5. 


1-5-2-5. 


2-5-3-5. 


3-5-4-5. 


4-5-5-5. 


Total. 


Number 














of cases 


20 


30 


35 


25 


15 


125 


Mean y . 


11-3 


12-7 


14-5 


16-8 


191 





The standard deviation of y is 31. [Hint : Use (6.15.1).) 

(L.U.) 

9. The three variates x,. x„ x, are measured from their means. 
S.-1; s t = l-3; s a = l-9; r„ =0-370; r u = - 0-641 ; r a = 
- 0-736. Calculate r,,.,. If x, = x, + x„ obtain r„. r a and r 4 ,.,. 
Verify that the two partial correlation coefficients are equal and 
explain this result. (L.U.) 
Solutions 

L m = 0-798; c m 0 084. 3. r = 0-330. 

4. 0-91. 8. e - 0-77; r = 0-85. 

9- - -0-586; r„ = 0-874; r a = 0-836; r, M = -0-586. 
E 



CHAPTER SEVEN- 



SAM PI. IC AND POPULATION 

I : SOME FUNDAMENTALS OF SAMPLING THEORY 

7.1. Inferences and Significance. So far we have been con- 
cerned with problems of descriptive statistics : we have con- 
centrated on describing distributions, summarising their main 
properties mathematically and establishing certain general 
principles exemplified by them. We have not as yet used 
these summaries and general principles for other purposes. 
This we must now start to do. For one of the fundamental 
problems of statistics is : 

How, and with what accuracy, may we draw inferences 
about the nature of a population when we have only the 
evidence of samples of that population to go on ? 

Suppose, for example, that we wish to find whether among 
males in the British Isles belonging to some specified age-group 
there is any correlation between height and weight. In 
practice, we cannot weigh and measure every individual 
belonging to this " population ". We therefore resort to 
sampling. Common sense tells us, first, that, other things 
being equal, the larger the sample, the better any estimate we 
base on our examination of that sample; and. secondly, that, 
whatever the size of the sample, that sample must be a 
representative one. 

Assuming, for the time being, that we have settled on the size 
of the sample or samples we shall take, how do we make sure 
that the sample will be representative, a random sample ? This 
is our first problem. 

Suppose, however, that we are satisfied that our method of 
sampling is of a kind to ensure random samples : we take our 
samples, measure the height and weight of each individual in a 
sample, and calculate a value for r, the correlation coefficient, 
based on the sample size, .V. Immediately a host of new 
doubts and misgivings arise : 

How do we know that the value obtained for r is really 
significant ? Could it not have arisen by chance ? Can 
we be reasonably sure that, although the variate-values 
obtained from a sample show a certain degree of correla- 
te 
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tion, in the population as a whole the variates are 
correlated ? 

Suppose we obtain from a second sample of .V a different 
value for r; or suppose that with a different value of .V 
we obtain yet another value for r. Which, if any. of these 
values shall we use as the best estimate of p, the correlation- 
coefficient in the population ? 

Clearly, unless we can establish some general rules of 
guidance on such matters, all our descriptive analysis will be 
of little use. This is one of the main tasks of that branch of 
statistics usually termed Sampling Theory. 

Before starting a more detailed discussion, let us set down 
what appear to be a few of the main types of problem with 
which the necessity of making statistical inference — inference 
from simple to population based on probabilities— confronts 
us : 

(a) There are those problems involved in the concept of 
randomness and in devising methods of obtaining random 
samples. 

(6) There are those problems which arise from the variation, 
from sample to sample of the same population, of the 
various sample statistics — problems concerned with the 
distribution of sample statistics. 

(c) There arc those problems connected with how to estimate 
population parameters from sample statistics and with the 
degree of trustworthiness of such estimates. 

And, lastly, 

(d) There are those problems which arise when we seek to 
test a hypothesis about a population or set of populations 
in the light of evidence afforded by sampling, problems, 
broadly, of significance. 

7.2. What Do We Mean by " Random "? Unfortunately it 
is not possible here to enter into a detailed discussion of the 
difficulties involved in the concept of ratulomness. The 
" dictionary definition " of random sample is usually something 
along the following lines : 

A sample obtained by selection of items of a population 
is a random sample from that population if each item in 
the population has an equal chance of being selected. 

Like most dictionary definitions, this one is not really very 
satisfactory, for, as the reader will realise, it has the air of trying 
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desperately to disguise something that looks suspiciously like 
circularity. Nevertheless, we must reconcile ourselves, here 
at least, to using it. It has, however, this virtue, that it brings 
out the fact that the adjective random applies to the method of 
selection rather than to any characteristic of the sample 
detected after it has been drawn. In this connection, two 
other, related, points must be made : 

(1) W hat we are out to get when we sample a population is 
information about that particular population in respect 
of some specified characteristic, or set of characteristics, 
of the items of that population. When sampling, we 
should keep asking ourselves. " What precisely are we 
trying to find out about what population ? " 

(2) A method that ensures random selection from one 
population need not necessarily do so when used to 
sample another population. 

What are the main types of population we ma)- sample .' 

In the first place, there are those populations which actually 
exist and are finite. liecause all measurement entails approxi- 
mation, the distribution of any variate in such a population is 
necessarily discrete. There arc two ways of sampling such a 
population : after selecting an item, we may either replace it 
or we may not. Sampling without replacement will eventually 
exhaust a finite population and automatically, after each 
selection, the probability of any item being selected is altered. 
Sampling with replacement, however, can never exhaust even a 
finite population, and is thus equivalent to sampling from a 
hypothetical infinite population. If the probability of any 
item in the population being chosen is constant throughout 
the sampling process, we call the sampling simple. Thus, with 
a stable population, sampling with replacement is simple 
sampling. It may happen, however, that a population is so 
large that even sampling without replacement does not 
materially alter the probability of an item being selected. In 
such a case, sampling without replacement approximates to 
simple sampling. 

The second type of population we are likely to encounter are 
theoretical or conceptual populations. The difference between 
an actual population and a conceptual one is illustrated when 
we compare a truck-load of granite chips with, say, the popula- 
tion of all real numbers between 0 and 1. Conceptual popula- 
tions may be finite or infinite, but any infinite population is 
necessarily conceptual; so is any population in which the 
variate is continuous. Apart from their intrinsic interest, con- 
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ceptual populations are important because they can be used 
as models of actual populations or arise in the solution of 
problems concerned with actual populations. 

Finally, there are " populations " such as that of " all 
possible throws of a die " or that of " all possible measurements 
of this steel rod ". These arc certainly not existing populations 
like a truck-load of granite chips, nor are they anything like as 
definite as the population of " all real numbers between 0 and 
1 ", which is, mathematically, precise. There are many 
difficulties with such "populations". Can wc, for instance, 
regard the result of six throws of an actual die as a random 
sample of some hypothetical population of " all possible 
throws " ? And, since there is no selection, no choice, can they 
be regarded as constituting a random sample ? And in what 
way do we conceive essentially imaginary members of such a 
" population " as having the same probability of being selected 
as those members which, in Kendall's phrase, " assume the 
mantle of reality " ? Perhaps all we can say at the moment 
is that such " populations " receive their ultimate justification 
in the empirical fact that some events do happen as i/they are 
random samples of such " populations ". 

7.3. Random Sampling. We come then to the very much 
more practical question of how to draw random samples from 
given populations for specific purposes. Certain general 
principles should be borne in mind. 

To begin with, successful sampling demands specialised 
knowledge of the type of population to be sampled. For example, 
successful sampling of the varieties of birds visiting a given 
10 acres of common land during a certain period of the year 
requires that the sampling scheme be drawn up with the 
assistance of ornithologists intimately acquainted with the 
habits of possible visitors. 

On the other hand, the method of selection must be independent 
of the properly or variate in which we are interested. If wc wish 
to sample a truck-load of granite chips, just arrived in the 
siding, for chip-size, it would be fatal to assume that, since the 
chips have been thoroughly shaken up on the journey, any 
shovelful will provide us with a random sample. A moment's 
reflection will convince us that there will have been at least a 
tendency for the more massive chips to gravitate towards the 
bottom of the truck, while the lighter and smaller tend to come 
to the top. However, had we been interested in sampling a 
given number of chums of milk for fat content, an adequate 
sampling scheme would have been to select a number of the 
churns at random and, then, having thoroughly stirred their 
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contents, to ladle out a given quantity from each. But how- 
can we select a number of churns at random ? Can we rely on 
"haphazard" human choice? The answer is "No". In- 
deed, we should seek to eliminate the human factor as far as 
possible. For experience tells us that human choice is certainly 
not random in accordance with the definition we have here 
adopted. Even in cases where, at first sight, bias would seem 
hardly likely, as, for instance, in choosing the final digit in a 
set of four digit numbers, bias is most definitely operative. So 
to eliminate this factor, we resort to a number of methods, of 
which but two can be mentioned here. 

7.4. Ticket Sampling. The first method is ticket sampling. 
Let us assume that we have a finite population of .V items. We 
construct a model of this population as follows : 

On N similar cards we write down the relevant features 
of each member of the population, shuffle the cards 
thoroughly and draw n cards, say, representing a sample 
of n from the actual population. 

This is a fairly reliable method, but, if the population is 
large, involves much work preparing the model. Moreover, to 
ensure that shuffling is really thorough is by no means as 
simple as it sounds. 

7.5. Random Sampling Numbers. The second method is the 
method of using random sampling numbers. Given a finite 
population, we assign to each item of this population an ordinal 
number 1, 2, 3, ... N. This set of numbers is virtually a 
conceptual model of the actual population. 

Suppose we wish to draw a sample of n. We use a table of 
random numbers. Among the best known are : 

L. H. C. Tippett, Tracts for Computers, No. 15, giving 
10,400 four-figure numbers, composed of 41,600 digits. 

M. G. Kendall and B. Babington Smith, Tracts for Com- 
puters, No. 24, giving 100,000 digits grouped in twos and 
fours and in 100 separate thousands. 

R. A. Fisher and F. Yates, Statistical Tables for Biological. 
Agricultural and Medical Research, giving 15,000 digits 
arranged in twos. 

A Million Random Digits, published by the Hand Corpora- 
tion, Santa Monica, California, giving five-figure 
numbers (see Table 7.1). 

We do not " pick out " numbers haphazardly from such 
tables as these. Indeed, it is essential not to do so, for it 
is extremely likely that if this is done the bias of number- 
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preference, which we seek to eliminate by using such tables, 
will operate once more. Instead, having stated the table of 
numbers used and having indicated which section is taken, we 
should work systematically through that section. An example 
will make the procedure clear. 

Example : Draw a random sample of 20 Jrom the " population " in 
Table 5.1. Calculate the mean of the sample and compare it 
with the population mean (see 5.6. Example 1, where the mean is 
found to be 67-852 in.). 

Treatment : referring to the tabic, wc number the items as 
follows : 



Height fin \ 


1'rco uenc v 


Sampling Number. 


5!) and under 


23 


1-23 


60- 


16!) 


24-102 


61- 


43!) 


193-631 


62- 


1.030 


632-1,161 


63- 


2,116 


1,162-3.777 


64- 


3,'.I47 


3,778-7,724 


65- 


5,966 


7,725-13.689 


66- 


8,012 


13,600-21.701 


67- 


9.(18!) 


21,702-30.700 


68- 


8,763 


30,7!) 1-3!).."..-,:! 


60- 


7.132 


39,554-40. 881! 


70- 


5,314 


•iii.iisii ni.'.iii'i 


71- 


3.320 


52,000-55.31!) 


72- 


1,884 


65,320-87,903 


73- 


876 


57,2(1-1 58.070 


74- 


383 


58,080-58,462 


75- 


153 


5S.463-58.61 5 


76 


63 


5,S,0I6-5K,678 


77 and over 


25 


58.67!)-58.703 




58,703 





We own read off from Table 7.1 20 successive five-figure numbers less 
than 58,704, ignoring those greater than 58,703. We thus obtain : 

23780 28391 05040 55583 45325 05490 11186 15367 11370 
4278!) 20511 55968 17264 37110 08853 44155 44236 10089 
44373 21149 

Our sample of 20 is consequently made up as follows : 

2 items from the 64- class ; 4 items from the 65- class ; 

3 items from the 66- class; 3 items from the 67- class; 

1 item from the 68- class ; 5 items from the 69- class ; 

2 items from the 72- class. 
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Taking the mid-value ol the classes as those of 5.6. e.g., the mid-value 
of the 85- class to he (15-375 in., the mean value of the sample is 
immediately found to be 1351-5/20 — 67-575 in., as compared with 
67-852 in., the population mean. 

Exercise : It is desired to obtain random samples from the following : 
(i) a truck-load of granite chips; (ii) a forest of mixed hard- 
wood and softwood; (Hi) the population of London; (iv) all the 
cattle in Oxfordshire ; (v) the varieties of birds visiting a given 
10 acres of common land ; (vi) plants in a very large area of the 
Scottish Highlands. 

Explain the principles which would guide you in collecting such 
samples. (L.U.) 

Table 7.1. Random Numbers 

(From A Million Random Digits, published for the Rand Corpor- 
ation by the Free l'ress (Glencoe, Illinois), previously published in 
the Journal of the American Statistical Association, Vol. 48, No. 264, 
December 1053.) 

23780 28301 05940 55583 81256 15325 05100 65074 11186 I5357 
vvJlo 02457 89200 94096 11370 42789 69758 79701 29511 5596S 
97523 17204 82840 59556 37110 08853 59083 95137 76538 44166 
80274 79932 41236 10089 11373 S2805 21149 03426 17694 31427 
04971 19055 95091 OS367 28381 03606 46497 28626 87297 36568 
67286 28749 81905 15038 38338 65670 72111 91884 66762 11428 
14262 09513 25728 52539 86806 57375 85062 89178 08791 39312 
39483 62469 30935 79270 91980 51206 65749 11885 49789 97081 
70908 21506 16269 54558 18395 69944 65036 63213 56631 88862 
94963 22581 17S82 83558 31960 99286 45236 47427 74321 67351 

Note : This table gives 500 random digits, grouped for con- 
venience, in live-digit numbers. Suppose six random numbers less 
than 161 are required, read off successive groups of three successive 
digits, rejecting those greater than 160. The result is: 

(237) (802) (839) 105 (940) (555) (838) 125 (645) (325) 054 
(906) (597) (411) (186) 153 (578) (824) 092 (457) (892) 009 

The numbers in brackets arc those greater than 160 and are rejected. 
The six random numbers less than 161 so obtained are therefore : 

105 125 54 153 02 and 9 

7.6. The Distribution of Sample Statistics. In the example 
of the previous section, we saw that our sample mean differed 
somewhat from the mean of the population. A different 
sample of 20 would have yielded a different value, also differing 
from the population mean. Were we to take a large number 
of samples of 20 we should have what in fact would be a 
frequency distribution of the mean of so many samples of 20. 
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Suppose we drew every possible sample of 20 from the popula- 
tion of 58,703. This would give us, for sampling with replace- 
ment, the enormous number of 58,703 l °/20! such samples. 
(How ?) If we drew the relative frequency polygon for this 
distribution, it would approximate closely to a continuous 
probability curve, with its own mean, variance and moments 
of higher order. Likewise, other sample statistics, the sample 
variance, for instance, also have their distributions for samples 
of a given size. So the question arises : 

What do we know about the distribution of sample 
statistics when : («) the population sampled is not specified, 
and (b) the population is specified ? 

7.7. The Distribution of the Sample Mean. We begin by 
recalling our definition of a random sample (7.2) and then, to 
make it of practical, mathematical use. reformulate it as 
follows : 

Definition: Random Sample. If Xi, (J = 1, 2, 3, ... n) 
is a set of n statistically independent variates, each distri- 
buted with the same probability density <p{x) and if the 
joint probability density of the set is given by/(x„ x ., . . . 
X„) = f ix,) . <p{x.J . . . 4>(x„), then the set x,-, (i — 1, 2, 3, . . . 
n) is a random sample of n from a population whose 
probability density is 6(x). 

Now suppose from the n variates with means u.,^*,-), we 
form a new variate 

n 

X = a,*, + ajct + ..,+ «*** + •.■ + «n*« = £'4* 

(7.7.1) 

where the a's arc arbitrary constants, not all zero. We have 
€(X) =e( S a*,) = £ «£(*,) 

or u,'(AT = i < 7 - 7 -*) 

If we put ai = 1/n, for all t, and subject the Xi to the con- 
dition that they all have the same probability density, «e have 

£?(*,) = 6(x t ) = . . . = €(x„) = 8 (x) = u, 

and X becomes + x t + . . .+*»)/». the mean of a 
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sample of n from a population <j>(x) and mean [x. So (7.7.2) 
becomes 

£(x ) = I S Six,) = i . n£[x) - ix . (7.7.3) 

Thus 

the mean value of the mean of all possible samples of n is 
the mean of the population sampled : or 

the mean of the sampling distribution of x is the population 
mean. 

If « = 1. that this is so is obvious, for, taking all possible 
samples of 1, the distribution of the sample mean is identical 
with the distribution of the individual items in the population. 
(7.7.3) is also the justification of the common-sense view that 
the average of a number of measurements of the " length " of 
a rod. say, is a better estimate of the " length " of the rod than 
any one measurement. 

What of the variance of .f ? We write jx/ for the mean of 
*fj n.v for that of X, and o, J and ax 1 for the variances of xt 
and X, respectively. Also let pi, be the coefficient of the 
correlation between and x, (i y± j), assuming such correlation 
to exist. 

Then 

Xa t (xt - |x ( )j , or, if i ijij, 
= J. ai\xt - |X()» + S £ aia,(xi - ^(x, - w ) 

So 

£[(X - u A .)«l = | a ?e\\x, - u ( )»] 
But 

£[(*l - W*)] = £ W - %tm + |X(«) = BOM - |x,» = ft* 
Also, when » ?t j, 

£[{*i - - ix,)] = £(XiX } — mxj — |x^r, - |x, W ) 

= £ (XlXj) - |X,(XJ 

= COV IXijXj) = OiOjPij 

Hence 

i* mm 

a.i' = 2 a,»a,* + 2 2 a,ajO,o,pi„ (i ^ j) (7.7.4) 
i-i 1/- i 
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This is an important formula in its own right. If, however, the 
variates xt are independent, then = 0, for all i, j, and 

» 

ox* = + o^o,* + . . . + s S^a/'oi' 

(7.7.4 (a)) 

Again putting m = 1/n, for all t, and subjecting the .r's to the 
condition that they all have the same probability density <f>lx). 
so that A' — x. the mean of a sample of n from </>[x), and 
o,* = o t * = . . . = o n a = a*, say, we have, since xt and 
Xj(i ^ j) are independent, 

ox* = a/ = a'/n or K = i= . (7.7.5) 

vn 

Thus : 

The variance of the distribution of the sample mean is 
1 /nth that of the variance of the parent population, n being 
the size of the sample. 

In other words, the larger the sample, the more closely the 
sample means cluster about the population mean value. 

The standard deviation of the sampling distribution of i is 
usually called the Standard Error of the Mean, and, in general, 
the standard deviation of the sampling distribution of any 
statistic is called the Standard Error of that statistic. 

The above results hold for any population, no matter how the 
variate is distributed. However, it is known that, whatever 
the population sampled, as the sample size increases, the 
distribution of S tends towards normality ; while, even for 
relatively small values of n, there is evidence that the 
x -distribution is approximately normal. 

7.8. The Distribution of x when the Population Sampled is 
Normal. Consider a normal population defined by 

4>(x) m _J= exp [ - ix - ,x)W;. 

Here, we recall. |x is the population mean and a s , the popula- 
tion variance. 

The mean-moment generating function for a normal distribu- 
tion of variance a s is M,„(l) = exp (■^a , / , ), (5.4.2). and the 
function generating the moments about the origin is 

M(t) m M m (t) exp (|x<) = exp (|x< -f- 

Now, remembering that the m.g.f. for any distribution is 
M(t) £(exp xt), the m.g.f. of the mean, x, of the n independent 
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variates x : , (i = 1, 2, 3, . . . n), each with probability function 
Six), is 

c(exp.f/) - t(exp S Xitjn) =£"(11 expx.V/n) 
/-i W-i „ / 

= II [6 (exp XiHn)] 

But, since (x„ x s , . . . x„) is a sample from 6(x), 
£(cxpXt) = (£(exp xt/n))« = (.*/(//«))« 
= exp (M(|x//n + ioV/w*). 

i.e., the m.g.f. for x is 

exp (pi + i(o«/n)<«) . . . . (7.8.1) 

But this is the m.g.f. of a normal population with mean u. and 
variance o J /n. Hence — 

The MHR of samples of n from a normal population 
(u., a) is itself normally distributed about u. as mean with 
standard deviation (error) a I V«. 

This is in agreement with (7.7.3) and (7.7.5), but adds the 
important information that the distribution of x, when the 
population sampled is normal, is itself normal. Actually this 
is a particular case of a more general theorem : 

// x., . . . x n are n independent variates normally 
distributed about a common mean (which may be taken at 
zero), with variances o,*, a,', . . . o n *. then any linear 
function of these n variates is itself normally distributed. 

The proof is simple : Let X = 2 a t Xi. The m. m.g.f. of 

1 

X(, Mj(t) is exp (io, 1 / 2 ) and therefore 

M x (t) = £(exp Xl) = e (exp ( = 6 ( fl (exp «,*,/)) . 

But the x's are independent, and, so, 

M x (t) = II £(cxp aiXit) = II 6 (exp x,(a,t)) 

-•II M,( ai t) = exp ZaM'yij 
which is the m.g.f. of a normal distribution with variance 

<J.V* = £ a.'o,' .... (7.8.2) 
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Consequently. I 

(i) if n = 2 and a, = a, = 1, the distribution of the sum, 
x, -f x*, of the two normal variates x lt x„, is normal 
about the common mean with variance a, s + ov • anc * 

(ii) if n = 2 and a, = 1, a. = — 1, the distribution of the 
difference, x, — x 2 , of the two normal variates, x„ x.,, is 
normal about the common mean with variance a, 2 — a,.-. 

Now in many of the problems we encounter, n is large, and 
so we may assume that x, the mean of a sample of n from 
an infinite population of mean jx and variance o 1 , is approxi- 
mately normally distributed about u as mean with variance 
a* in, although we do not know that the population sampled is 
normal. This being the case the variate t = (x — uJ/o/Vm will 
be approximately normally distributed about zero with unit 
variance. In other words : 

The probability that the sample mean, x, will differ 
numerically from the population mean, u, by less than an 
amount d (measured in units of the standard error of the 
sample mean) is given approximately by 

P(\x - n| < d) = 2 f 6(t)dt 
't 

where 6(t) - ~- exp (- J/'). 

It must tie emphasised, however, that here we are sampling 
a population whose variance is assumed known. When this is 
not the case, the problem is complicated somewhat and will be 
dealt with later. 

7.9. Worked Examples. 

1. The net weight of " half-pound " boxes of chocolates has a 
wean of 0-51 lb. and a standard deviation of 0-02 lb. The 
chocolates are despatched from manufacturer to wholesaler in 
consignment* of 2,i>00 boxes. What proportion of these consign- 
ments can be expected to weigh more than 1.276 lb. net? What 
proportion will weigh between 1.273 and 1,277 lb. net? 

Treatment : We are here drawing samples of 2.. r >00 from an assumed 
infinite population. The mean net weight of boxes in a consign- 
ment weighing 1,276 lb. will be 1.276/2.500 lb. = 0-5104 lb. 

The standard error of the mean net weight for samples of 2.500 
will be 0 02/V2^500 = 0 0004 lb. Thus in this case t = 0 0004/ 
0 0004 = 1- The probability that the sample mean will deviate 



142 



STATISTICS 



from the population mean by more than this amount is P(t > 1) = 
0-5 - P(t ^. 1), for this is a "one-tail" problem. P{t < 1) = 
0-3413. Therefore P(t > I) = 0 1587. Therefore just under 16% 
of the consignments oj 2,500 boxes will weigh more than 1,278 lb. 

If a consignment weighs 1,273 lb., the mean weight of a box in 
that consignment is 0-5092. The deviation from the mean is then 
- 0 0008. or, in standardised units. — 2. If the consignment 
weighs 1,277 lb., the corresponding mean weight is 0-5108 lb., a 
deviation from the population mean of 4- 2 standard errors. The 
probability that a consignment will weigh between these two limits 
is then — this being a " two-tail " problem — 

P(- 8 < < < «) = 2P(t ^ 2) = 2 x 0-4772 = 0-9544 

In other words, just over 95% oj the batches o/ 2.500 boxes will lie 
between the given net weights. 

2. The "guaranteed " average lije of a certain type oj electric 
light bulb is 1.000 hours with a standard deviation of 125 hours. 
It is decided to sample the output so as to ensure that 90% of lite 
bulbs do not fall short of the guaranteed average by more than 
2-5%. What must be the minimum sample site? 

Treatment : Let n be the size of a sample that the conditions may 
be fulfilled. Then the standard error of the mean is 125/v / i7. Also 
the deviation from the 1,000-hour mean must not be more than 
25 hours, or. in standardised units, not more than 25/125/ Vn = 
Vh/5. This is a " one-tail " problem, for we do not worry about 
those bulbs whose life is longer than the guaranteed average. 

/'(/ > »|i/fl) - 0-1 and, so. P(t < «l/5) = 0.4 

Usin,' Tabh- 5-4. we And that / = 1-281. Therefore. al/S = 1-281 
or H - 40-90. Consequently, the required minimum sample she is 41. 

3. The means of simple samples of 1,000 and 2,000 are 07-5 and 
08-0 respectively. Can the samples be regarded as drawn from a 
single population of standard deviation 2-5 ? 

Treatment : Just as the sample mean has its own distribution, so, 
too, does the difference between the means of two samples of a given 
size. If x, and x, are two independent variates distributed about 
mean y.„ u t with variances o, J ,o,«, respectively, let X = x l — x,. 
Then X — p, — u„ and. by 7.7.4 (a), = a,* + o,'. Now set up 
the hypothesis that two samples of 11 , and n, are drawn from a 
single population (,., o). Then, if x, and £, arc the means of these 
two samples, X is distributed about zero mean with variance 
jx* = o'/n, + o*/"j- And, since .f, and X, arc approximately 
normally distributed, so is X. 

In our example, X = 0-5 and oj« = (2-5)* (1/1.000 + 1/2,000) = 
0 009375 or o x = 0 0008. Thus the observed X is more than 5 times 
the standard error oj Us distribution calculated on the hypothesis that 
the samples are from the same population with standard deviation 
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2.5. This, on the assumption that -V is approximately normally 
distributed, is most unlikely. We therefore reject the hypothesis 
that the samples are from a single population of standard deviation 
2.5. 

7.10. Sampling Distribution of the Mean when Sampling is 
without Replacement from a Finite Population. Let the 

population sampled consist of A' items. Let the sample 
size be 11. The number of samples of n that can be drawn 

without replacement from N b 1^. In these samples any 

n — i J t ' mes - ' or 

if xi is chosen, there are but ( _ ') ways of forming 

samples of n. Let the population mean be zero. Then 

.v 

£ xi = 0. If uij is the mean of the jth sample, let m lie the 
1-1 

mean value of all possible values of Mj. Denoting the sum 
of the items in the/th sample by ^ £ *, j , we have 

/ " \ iN\ 

""" = and uf-,?^ 

Consequently, 

(?) I T. \ /N — 1 \ £ 

^(,V<),= (»-•).?.*'; 

since each x t occurs fT , J times and so n 12 Mj = 0, i.e., 

m = 0. Thus the mean of the means of all possible samples is the 
mean oj the parent population. 

If a ! is the variance of the population, we have, taking the 

,v 

population mean as origin, No* = 

Moreover, if o,„ l is the variance of the sample mean, 

/A\ (") 1 1 " \ 1 

\nj j-i u»y_iV,_i / 

•i[(t':. , )l^(i:.1fH"* 
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Rut [ % *<) =2 x? + SS(m), (i * j), 

\ I mm l I I m I i j 

A M 

and, since 2 *, = 0, 2 EUtaA = — 2 Xj*. 

o 1 A' - « AM 



w ' N — 1 " n! A/ — «! 

• • • & 

If we let A' — >•« , dm* — y o ! /m, showing that when a popula- 
tion is very large, sampling without replacement approximates 
to simple sampling. 

7.11. Distribution of the Sample Variance. Let s 1 be the 
variance of a sample of n and let i be the sample mean. Then 

1" 1" I" /"\ ! 

* - a <*■ - s)t = a,* - * - 
= Li* 1 - * # * 

Consequently 

Bat there are n ways of chosing from M values and, once .r, 
is chosen, there are (n — 1) ways of chosing .v, so that xj x-,. 
Also, since and xj are independent 

6 ( 2 2 (44)) = 22 [#(*%)] = 22 [£>,) . £[*,)] 
y ( i if) i j 

= 22 (6V))> = 0 

I I 

since £{x) = n,' - 0. Therefore 

(711.1) 

u,', ji.', and (Xj ■ o*, being population parameters. 
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Thus the mean value of the variance of all possible samples of 
M is (» — 1)/h limes the variance of the -parent population ; and, 
as we should expect, £{s*)->-a f , as the sample size increases 
indefinitely. 

Thus if we draw a single sample of n from a population and 
calculate the variance, s*. of that sample, we shall have 

E(ns*i(n - 1)) = £•((*< - *)*/(« - D) - «■ • (7.11.2) 
In other words : 

If we calculate 2 (x t — x)*/(n — 1), instead of the usual 
2 (x,- — xj'/n, we have an unbiased estimate of the popula- 
tion variance, a 1 . 

Of course, the actual figure calculated from the data of a 
single sample will, in general, differ from the actual value of 
o*. But, if we continue sampling and calculating ns*/(n — 1), 
wc shall obtain a set of values, the mean value of which will 
tend to the actual value of a* as the number, N, of samples of 
n drawn increases. A function of x and n which in this way 
yields, as its values, unbiased estimates of a population para- 
meter is called an unbiased estimator of that parameter. Thus 
if 0 is a population parameter and 0 (read " theta_cap ") is an 
estimator of 0, 6 is an unbiased estimator if £[Q) = 0. w,' 
^ £*(/»« is an unbiased estimator of |i, the population mean. 

since £(>/»,') = u, ," on the other hand, s* = 2 (*,• — x)*ln is a 

i 

biased estimator of a*. For this reason, some writers define 
the variance of a sample to be 2 /;(*( — x)*l{n — 1), 2 /, = »»; 

i i 
with this definition, the sample variance is an unbiased estimate 
of the population variance. Although we have not adopted 
this definition here, we are introduced by it to an important 
notion — that of the degrees of freedom of a sample. 

Let -v be the mean of a sample of n. Then nX = 2 and, 

I 

for a given x, there are only n — 1 independent *'s, for when 
«e have selected n — 1 items of the sample, the Ktfa is neces- 
sarily determined. 2*f = nx is a linear equation of constraint 

i 

on the sample. 

If there are p(< n, of course) linear equations of constraint 
on the sample, the number of degrees of freedom of the sample 
is reduced by p. 

We must now ascertain the standard error of the sample 
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variance. Once again we confine our attention to the case 
when the population sampled is normal with zero mean and 
variance a 1 . 

What is the probability that a sample (.»,, x„) from 

such a population is such that its mean lies between x ± \dX 
and a standard deviation lying between s ^ \ds ? 

Since the n x's are independent yet are drawn from the 
same population, the probability that the n values of x shall 
lie simultaneously between 

*i ± hfat, *% ± \dx t , ... *, ± \dx„ is 

dp = (2l^*)^ eXp + ** f + • • • + Xn 1 )l2^ t )dx l dx t . . . dx„ 

(7.11.3) 

Now think of (.v„ #„ . . . x n ) as the co-ordinates of a point 
P in a space of n-dhnensions. Then dx l dx t . . . dx H is an 
element of volume in that space. Call it dv. Then dp is the 
probability that the point P shall lie within this volume 
element. If now we choose this volume element dv in such a 
way that any point P, lying within it, represents a sample of 
n with mean lying between x ± \dx and a standard deviation 
between s ± ids, dp will be the probability that our sample has 
a mean and standard deviation lying between these limits. 
Our problem therefore is to find an appropriate dv. Now we 
have the two equations, 

11 11 

E Xi = nx and S (*, — x) x ^ ns*. 

<-] (ml 

Each of these equations represents a locus in our 11-di- 

H 

mcnsional space. If M were equal to 3, the equation D xi 

1— 1 

= nx may be written (*, — X) -f (*, — x) + (x , — X) =• 0 
and represents a plane through the point (x. X, X). Moreover, 
the length of the perpendicular from the origin on to this 
plane is 3#/3J = 9 . 31 and, so, the perpendicular distance 
between this plane and the parallel plane through (X + dX, 
X + dX , X + dX) is dx . 3*. In the n-dimensional case, the 

equation S = nx represents a " hypcrplane ", as it is 

called, and the " distance " between this " plane " and the 
" parallel plane " is dx . ni. 

n 

Again, if n = 3, the equation L (*; — x) 1 — Ms* becomes 



SAMPLE AND POPULATION. I 



147 



(*, — X) 1 -f (*t ~ + (*» — *)* = 3s*. and thus represents 
a sphere, with centre (X, X, X) and radius s . 3*. The plane 

3 

E Xi = 3-f passes through the centre of this sphere. The 
(-1 

area of section will therefore be a circle of radius s . 3J, whose 
area is proportional to s*. If s increases from 5 — {tls to 
5 4- \ds, the increase in the area of section is d(s*). So the 
volume, dv, enclosed by the two neighbouring spheres and the 
two neighbouring planes will be proportional to dx . d(s 1 ). In 
the n-dimensional case, instead of a sphere, we have a hyper- 
spherc " of " radius " s . ni and this hypersphere " is cut by 
our " hypcrplane " in a section which now has a " volume 
proportional to s"-'. So, in this case, do is proportional to 
dx . d{s"-') ---- fts"-*dsdX, say. Consequently, the probability 
that our sample of n will lie within this volume is given by 

dp = (2^f-» exp {- m l * l **}+~'* s& • < 7ii - 3 <«)) 

Rut the equation 2 — X) 1 = ns 1 may be written 
1-1 

£ Xi 1 = w(s* + X 1 ) and, therefore, 

(ml 

dp = ft, exp (— nX*l2a*)dX x 

X ft, exp (- ns s /2<j , )(s , )<''- 3 > fdis*) . (7.11.4) 
where ft, and A, are constants. 1 

1 Determination of A, : Since 

I A,cxp(- nx t j->o*)dx = 1 

we have immediately (5.4(e) footnote) A, = (2jro a /«)"l. 
Determination of ft, : t* varies from 0 to x ; therefore 

I A, exp (- inV^Ks')'— '"Ms") = 1 

*« 

Put ns*l2o* = x; then 

ft, f (2«*/h)<— »'*exp (- x)*"-»i*->dx = 1 
0 

But since, by definition (see Mathematical Note to Chapter Three), 
j exp ( - x)x"- » 9 - ' dx = '•( (n - 1 ) /2) 
A, - (H/2o»)'-»'»/r((.. - l)/2). 
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We see immediately that, when the population sampled is 
normal : 

(i) the mean x and the variance s 1 of a sample of n are 
distributed independently ; 

(ii) the sample mean is distributed normally about the popula- 
tion mean — taken here at the origin — with variance 
o*/n ; and 

(iii) the sample variance s 2 is not normally distributed. 

The moment-generating function for the ^-distribution is 
M(t) m (exp Is 1 ). Thus 

M(t) = k.J exp (- »is»/2o«) exp (Is*) («•)<» -»>.«d(s») 
we have 

*">-['-?T m A(^)x 

xj exp (- X l )(*«)(~)~'</(A"«) 

But. by 3.A.3.. 

I exp (- X*) . (W^)" = PpY-*) 

.-. M(/) = [l - ^] " ("r 1 ) . . (7.11.5) 

The coefficient of / in the expansion of this function is the 
first moment of s J about the origin : (n — l)o'/«, as already 
established. The coefficient of < 2 /2 ! is the second moment of 
s 3 about the origin : (»»* — l)<r'/n J . Hence 

var (s») = (w J - l)<r*/« - (n - l)»a'/»* - 2(« - l)o«/»i a 

For large samples, therefore, var (s ! ) — 2o' w. In other words, 
the standard error of the sample variance for a normal parent 

population is approximately a*sj jj for large samples. 
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7.12. Worked Example. 

// s,' and s,* arc Hie variances in two independent samples of 
the same size taken from a common normal population, determine 
the distribution of s,* + *»'• (L.U.) 

Treatment : The moment-generating function of s* for samples of 
11 from a normal population (0. o) is (7.11.5) 

M(t) (1 - 2o s //u) -'«-')'» 
Hence £"(exp/s,») m £"(exp Is,') = M(t) 

But, since the samples arc inde[)cndent, 

£(expi(i,« + *,*)) - f [(exp /s,")(exp /*,«)] m [A/(/)]* 
Hence the m.g.f. for the sum of the two sample variances is 

(I - 2o«//») -«"-■>. 
Expanding this in powers of t, we find that the mean value of 
s, s + s,' is 2(n — l)o*/"— the coefficient of »/l !— and var (s,» + s, % ) 
= 4m(»i - l)o«/n* - 4(h - 1)*<7 4 /"* = *(" - which, for 

large M, is approximately equal to 4o 4 /w. 
The probability differential for s,' + s t * is 

dp = exp (- »i(s,« + 5,*)/2o»)(s i « + s,')-'^,' + *,*) 



EXERCISES ON" CHAPTER SEVEN 

1. Using the table of random numbers given in the text, draw a 
random sample of 35 from the " population in Table 5.1. Calculate 
the sample mean and compare it with the result obtained in 7.5. 

2. A bowl contains a very large number of black and white balls. 
The probability ol drawing a black ball in a single draw is p and 
that ol drawing a white ball, therefore, I — p. A sample ol m balls 
is drawn at random, and the number of black balls in the sample 
is counted and marked as the score for that draw. A second sample 
of in balls is drawn, and the number of white halls in this sample 
is the corresponding score. What is the expected combined score, 
and show that the variance of the combined score is '2mp(i — p). 

3. Out of a batch of 1,000 lb. of chestnuts from a large shipment, 
it is found that there are 200 lb. of bad nuts. Estimate the limits 
between which the percentage of bad nuts in the shipment is almost 
certain to lie. 

4. A sample of 400 items is drawn from a normal population whose 
mean is 5 and whose variance 4. If the sample mean is 4-45, can 
the sample be regarded as a truly random sample ? 

5. A sample of 400 items has a mean of 1-13; a sample of MO 
items has a mean of 1-01. Can the samples be regarded as having 
been drawn at random from a common population of standard 
deviation 0-1 ? 
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6. A random variate * is known lo have the distribution 

p(x) = c(l + */«)"-' exp (— mx\a), — a ^* < eo 
Find the constant c and the first (our moments of a-. Derive the 
linear relation between /}, and 0 t of this distribution. (L.U ) 

7. Pairs of values of two variables x and y are given. The vari- 
ances of x, v and [x — y) are o*. o* and o^., 1 respectively. Show 
that the coefficient of correlation between -r and )• is 

«/ + ~ "u-.? 
20,0, 

(L.U.) 

8. If ti = ax + by and v — bx — ay. where x and y represent 
deviations from respective means, and if the correlation coefficient 
between x and )' is p. but u and D are uncorrected, show that 

o,o r = (a» + b*)o,o,Vl - p« 

(L.U.) 

Solutions 

2. Expected score is m. 

''i id 

3. Probability p of 1 lb. of bad nuts is = 0-2. Assume this 

is constant throughout the batch, q — 0-8. Mean is up and variance 
is npq. For the proportion of bad nuts we divide the variate by 
w and hence the variance by giving variance ■ p qjn. The 

standard error of the proportion of bad nuts is — nj^ ~n5oTT ~ 

<= 0-1264. The probability that a normal variate will differ from 
its mean value by more than three times its standard error is 0-0027. 
We can be practically sure that no deviation will be greater than this. 
Kequired limits are therefore — for the % of bad nuts— 100(0-2 ± 
3 x 0-01204) = 23-8% and 10-2%. 

4. No, deviation of sample mean from population mean > 4 x S . E 
of mean for sample of size given. 

5. Difference between means is nearly twice that of S . E of 
difference of means i/ — 0-8. 



5. Hint : ( 



p[x)dx = 1. Transform by using substitution 



1 -f xja = at/m. c = hi" — e-"laV(m) ; mean-moment generating 
function is — ~J ; n\ = a; fi t = a*/2w, ji, = a'/Sm'; 

M| = «»(»' + 2)/Swi«; 80. - 00, = 4. 
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8.1. The t-distribution. We have seen that if x is the mean 
of a sample of n from a normal population (u,, a) the variate 

t s (g - u,)/c/v / n 

is normally distributed about zero mean with unit variance. 
Hut -what if the variance of the parent population is unknown and 
we wish to test whether a given sample can be considered to be 
a random sample from that population ? The best we can do 
is to use an unbiased estimate of a based on our sample of n, 
i.e., j(n/n — I) 1 , s being the sample variance. But if we do 
this, we cannot assume that 

<-<#'- U )(n-l)*/s . . . (8.1.1) 

is a normal variate. In fact, it is not. What is the distribu- 
tion of this / (called Student's I) ? 

Let the population mean be taken as origin, then 

/ s (» - l)^/ s . . . (8.1.1(a)) 

Since we may write t m (n — 1) '(^/o/s/o). /, and, therefore, 
the /-distribution, is independent of the population variance — 
a most convenient consequence which contributes greatly to 
the importance of the distribution. // we hold s constant we 
have sdt = (n — l)*rf.v. Now i is normally distributed about 
zero with variance n~ l (since / is independent of o, we may 
take ami) and, as we showed in the last chapter, x and s s 
are statistically independent, when the parent population is 
normal. Thus 

dp(x) = (w/2tc)* exp (- nx*l2)dx 

Consequently, the probability differential of t for a constant s* 
may be obtained from dp(x) by using 8.1.1(a) and the relation 
sdt = (h — 1)UE?; we have 

dp(l. constant s*) = [n/27r(v. - \))is exp [- *sH*(2(n - l)]dl 

(8.1.2) 

<5« 
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If now we multiply this by the probability differential of s J 
and integrate with respect to s* from 0 to ec , we obtain the 
probability differential of t for all s*. By (7.11.4). 

In /2)<" ~ W* 

^ (S,) = r[(n - l)/2] CXp ( ~ '"'/^(^-'"^(s*) 

• #(0 = dt I ~nj2T.(n - 1)]J exp [- ns'/'^n - 1)] s x 

0 (n/2)(»-D« 

X ^-t)/*] (~ '" s /2)(s')<»- »/»W(s«) 

. (w/2)( "-i)/g[n/2 ^( n - 1)]1 

T[(n - l)/2] x 

:< dlf (s»)(«- exp [- ni*{l + /*/(« - l}/2;rf(s*) 
Putting s 3 = 2vln[\ 4- /»/(» - 1)]. we have 

w m - i)/2j 

X (2/n)" /s [l + /•/(« - l)]-"'*rf< / «<-'*'- 1 exp (- »)*» 
T(»'/2) 

= ^(«-l)lr[(n-l)/2j • + ,t{n ~ W** 

JO 

(since J ^"'""'exp (- v)<fo = r(n/2)) 

CI 

(since Vjt = r(j)) 

J 1 

<(S i) 

V 2 *7 (8.1.3) 

We see at once that / is not normally distributed. If, for 
instance, n = 2. we have c/p(t) - (l/ir)(l + <»)-', which defines 
what is known as the Cauchy distribution, a distribution 
departing very considerably from normality, the variance, for 
example, being infinite. However, using Stirling's approxima- 
tion (5.2.7), it can be shown, as the reader should verify for 
himself, that B[(n - l)/2, 1/2] . (n — 1)* tends to V2k as 
n — >- to ; at the same time, since [1 + <*/(« — l)]-"' s may be 
written [(! + /'/(« - l)) 0 *"** X (1 + <»/(» - 1))]-*, while 



or d P{l ) = p t ^ . + <•/(» - »]"•** 
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(1 -f xlm)"' — >■ exp x as m— (1 -f- <»/(» — l))"' 2 — ■>■ exp 
(- <■/*). 

Thus thet-distribution approaches normality (-^c\p{ — /*/2) \ 
as n increases. N 

It is customary to put v = (n — 1), the number of degrees of 
freedom of the sample, and to write the probability function of / 
for v degrees of freedom thus 

In his Statistical Methods for Research Workers. Sir R. A. 
Fisher gives a table of the values of | / 1 for given v which will 
be exceeded, in random sampling, with certain probabilities 

(P). Fisher s P is related to F r (t) by P = 1 — 2 J /•„(/)<//. 

0 

8.2. Worked Example. 

A random sample of 16 values from a normal population is 
found to have a mean of 41-5 and a standard deviation of 2-705. 
On this information is there any reason to reject the hypothesis 
that the population mean is 43 ? 

Treatment : I = 1-5 x 16*/S-7M = 2 078 for 15 degrees of free- 
dom. Entering Table 8.1 at n= 15, we find that the probability 
of / > 1-75 is 010 and of / > 2-13isO-05. Thus the probability that 
the population mean is 43 is over 0-05. On the information pro- 
vided by the sample there is then no reason for rejecting the 
hypothesis. 

8.3. Confidence Limits. Suppose, in the above example, we 
had wanted to find, from the sample data, the limits within 
which the population mean will lie with a probability of 0-95. 
We call these limits the 95% confidence limits of the population 
mean for the sample in question. To find these limits, we put 

| 41-5 - ix| j 
1 ' 1 " 2-796 * 16 
Entering Table 8.1 at v = 15, we find that the value of | / 1 
which will be exceeded with a probability of 0-05 is 2-13. 
Hence 

X ,5* < 2-3 

or 

39-9 < n < 43-1 
Exercise : Show that the 08% confidence limits are 30-55 and 4:t-45. 
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Table 8.1. Values of \l\ for Degrees of Freedom Exceeded 
with Probability P in Random Sampling 

(Abridged, by permission of the author, Sir R. A. Fisher, and the 
publishers, Messrs. Oliver and Boyd, from Statistical Methods for 
Research Workers.) 



V. \. 

2t 


0-50 


010 


005 


002 


001 


1 


I 


6-34 


12-71 


31-82 


63-60 


2 


0-816 


2-02 


4-30 


6-96 


9-92 


3 


0-705 


2-35 


318 


4-54 


5-84 


4 


0*741 


213 


2-78 


3-75 


4-60 


5 


0-727 


202 


2-57 


3-36 


403 


0 


0-718 


1-94 


2-45 


314 


3-71 


7 


0-711 


1-90 


2-30 


300 


3-50 


fk 


0-706 


1-86 


— iii 


--.HI 


0 CD 


9 


0-703 


1-83 


2-26 


2-82 


3-25 


10 


0-700 


1-81 


2-23 


2-76 


317 


1 1 


0-607 


1-80 


2-20 


2-72 


311 


12 


0-095 


1-78 


2-18 


2-68 


306 


13 


0-694 


1-77 


216 


2-65 


301 


14 


0-692 


1-76 


2-14 


2-62 


208 


us 


0-691 


1-75 


213 


200 


2-95 


1 0 


ft* Kflll 

\l U1HP 


1 IO 


— I — 




.1 II.) 


17 


0-689 


1-74 


211 


2-57 


2-90 


18 


0088 


1-72 


210 


2-55 


2-88 . 


If) 


0-688 


1 73 


209 


2-54 


2-86 


20 


0-687 


1-72 


209 


2-53 


2-84 


26 


0 084 


1-71 


2-06 


2-48 


2-79 


30 


0-083 


1-70 


204 


2-46 


2-75 


35 


0-682 


109 


203 


2-44 


2-72 


40 


0 081 


108 


202 


2*42 


2-71 


45 


0-680 


1-68 


2-02 


2-41 


2-69 


SO 


0-679 


108 


201 


2-40 


2-68 


60 


0078 


1-67 


200 


2-39 


2-60 


as 


0-674 


1-64 


1-86 


2-33 


2-58 



In general, for a sample of n with mean X and variance tfl, 




And if //• is the value of / with a probability P of being exceeded 
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for v degrees of freedom, then the (1 — P)\W)% confidence 
limits for n arc: 

x - stplv* < (x < X + s//./v» . . . (8.3.1) 

8.4. Other Applications of the /-distribution. It has been 
shown by Sir K. A. Fisher that : 

It / is a variate which is a fraction, the numerator of 
which is a normally distributed statistic and the denominator 
the square root of an independently distributed and un- 
biased estimate of the variance of the numerator with » 
degrees of freedom, then t is distributed with probability 
function 

Problem : Given two independent samples of n, and n, values with 
means X and X, how can we lest whether they are drawn from the 
same normal population ? 

We begin by setting up the hypothesis that the samples are from 
the same population. Let x,, (t = 1, 2, . . . «,), and X,. (jf — 1, 

«, 

2. . . . «,), be the two samples. Then S = S *//'/, and X 
= i. Xj/n,, while the sample variances are respectively 

s,» -J^t* ~ •*>'/", and s,« - |.t*i - Jf 

These give unbiased estimates of the population variance. 
«i*i 5 /(»i - ') and "V.'/K - 1>- 

Now since 

ffM.s,' + ••js,') = (ii, - l)o» + (»», - l)o' = (n, + N, - 2)o» 
5' ■ ( n ,s,» + «^,«)/(«, -fi-,-2) . - (8.4.1) 

gives an unbiased estimate of o' based on the two samples, with 
w = ii, + n, — 2 degrees of freedom. 

If our hypothesis is true- if,_that is, our samples are from the 
same normal population, i and A' are normally distributed about 
the population mean, with variances o'/ii, and o 4 /h, respectively. 
Therefore (7.9 Example 3), since the samples are independent, the 
difference, X — X. of their means is normally distributed with vari- 
ance 0*0 /ii, + 1/n,). 1 1 follows that ^(l /ii, + l/*t) isan unbiased 
estimate of the variance of the normally distributed statistic X — X , 
and therefore, in accordance with the opening statement of this 
section, 

[»,»./<", + »)J' • • • (M.2) 
is distributed like / with v = n, + ii, — 2 degrees of freedom. 
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8.5. Worked Example. 

1. Ten soldiers visit the rifle range two weeks running. The 
first week their scores were 

67. 24. 57, 55. 63, 54, 56, 68, 33, 43 

The second week they score, in the same order : 

70, 38. 58, 58. 56. 67, 68, 77. 42, 38 

Is there any significant improvement? Ilow would the test be 
affected if the scores were not shown in the same order each time ? 

(A.I.S.) 

Treatment : 



1st week 
score (x). 


2nd week 
score (.V). 


X - x 


(X - *)• 


*' 


X' 


1 67 


70 


3 


9 


4.489 


4.900 


24 


38 


14 


196 


576 


1,444 


57 


58 


1 


1 


3.249 


3,364 


55 


58 


3 


9 


3.025 


3.364 


63 


56 


-7 


49 


3.969 


3,136 


54 


67 


13 


16!» 


2.916 


4,489 


56 


68 


12 


144 


3,130 


4.624 


68 


77 


9 


81 


4,624 


5,929 


33 


42 


» 


81 


1.0S9 


1,764 


43 


38 


-5 


26 


1.849 


1.444 


520 (104) 


572 (10X) 


52 


764 


28,922 


34.458 



(1) We assume there is no significant improvement, that, conse- 
quently, both X and x arc drawn from the same normal population 
and that, therefore. X — x is normally distributed about zero. 
Then, regarding the 10 values oi X — x as our sample, we have 

s> *= var (X - x) = Z(X - *)«/»i - (X - S)* 

= 76-4 - 27 04 - 49-36 ; 

and, therefore, s = 7-026. Hence 

/ m (X - X)(n - l)i/s = 5-2 x 3/7 026 = 2-22 

Entering Table 8.1 at v = 9 we find that the probability with which 
/ = 2-26 is exceeded is 0-05, while the probability with which / = 1-83 
is exceeded is 0-10. Therefore the result, while significant at the 
10% level, is not significant at the 5% level. We conclude, there- 
fore, that there is some small evidence of improvement. 

(2) Had the scores not been given in the same order, we should 
have had to rely on the difference between the mean scores. We 
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again suppose that there has been no significant improvement and 
use the variale 

t = X -^-(«Sxl(». + «x))' i . 
where 5« = (» A « + n x s x *)l(n t + n x - 2). 

In the present case it, = n x = 10, and we have 

10j,« = Zx* - 10^» and 10s,« = SAT' - 10X*. 
.'. 10(s/ + Sjt 1 ) =S^+EX'- 10(.f -r X 1 ) 

= 28.922 4- 34.458 - 10(52« - r 57-2=) = 3,622. 
.'. o« = 10(s,» -i- Sj'1/18 = 201-2 or » = 1418. 

Consequently. I = ^ 8 x (100/20)1 = 0-82 lor v = 18 d.f. 

Kntering Table 8.1 at v = 18, we find that there is a 0-5 probability 
that / will exceed 0-688 and a probability of 010 that / will exceed 
1-73. Consequently, the result is not significant at the 10% level 
and there is no reason to reject the hypothesis that there has been no 
significant improvement. 

2. In an ordnance factory two different methods of shell-filling 
are compared. The average and standard deviation of weights in 
a sample of 96 shells filled by one process are 1-26 lb. fliirf 0-013 lb., 
and a sample of 72 shells filled by the second process gave a mean 
of 1 -28 lb. and a standard deviation of 0-0 1 1 lb. Is the difference 
in weights significant ? (Brookes and Hick.) 

Treatment : Assuming that there is no significance in the difference 
of weights, 

., _ 96 x (0-013)' + 72 x (0 011)' 
= 96 + 72 -2 

or * 0 01 25; 

I a' - A 7 I = 0-02 and [nji x l{n x + n x )i - (96 x 72/168)* = 6-43. 

/. I / | - 0 020 x 6-43/0-0125 = 10-29 for 166 degrees of freedom. 

Since v is so large in this case, we may assume that t is normally 
distributed about zero mean with unit variance. Then | t | > 10 
standard deviations and is, therefore, highly unlikely by chance 
alone. The difference in the weights is, then, highly significant. 

8.6. The Variance-ratio, F. We now discuss a test of 
significance of the difference between the variances of two 
samples from the same population. Actually, if the sample 
variances are such that the two samples cannot have been 
drawn from the same population, it is useless to apply the 
/-test to ascertain whether the difference between the means is 
significant, for we assume in establishing that test that the 
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samples are in fact from the same population. Thus, the 
present problem is logically the more fundamental. 

Problem : A standard cell, whose voltage is known to be 1-10 volts, 
was used to test the accuracy of two voltmeters, A and B. Ten 
independent readings of the voltage of the cell were taken with each 
voltmeter. The results were : 

A . I ll M.~> 114 1-10 1-09 1 11 1-12 Mo 1-13 114 
13 . 1 12 1 06 1 02 1 08 1-11 1 05 1 06 1 03 1 05 1 08 

Is there evidence of bias in either voltmeter, and is tliere any 
evidence that one voltmeter is more consistent than the other? 

(R.S.S.) 

We already know how to tackle the first part of the problem 
(see at the end of this section), but what about the second part ? 
The consistency of either meter will be measured by the 
variance of the population of all possible readings of the volt- 
meter, and this variance will be estimated from the ten sample 
readings given. Thus we have to devise a test to compare the 
two estimates. This has been done by Sir K. A. Fisher, whose 
test is : 

If u- and i> 2 are unbiased estimates of a population 
variance based on n, — 1 and n. — 1 degrees of freedom 
respectively (where n, and n 2 are the respective sample 
sizes), then by calculating z as \ log, (W/v 2 ) and using the 
appropriate tables given in Statistical Methods for Research 
Workers, we can decide whether the value of this variance 
ratio, tfiv 2 , is likely to result from random sampling from 
the same population. 

Let xi, (i = 1, 2. 3. . . «,). and Xj. (j = 1, 2, 3. . . . ;i s ), be 
two independent samples with means x and A' respectively. 
Unbiased estimates of the population variance are : 

«« = »v, f /(n, - 1). and v* = MjSsVfn, - 1), 

where s, ! and s„* are the respective sample variances. If 
v, - w, - 1. v, » n, - 1. 

s,« = v,«*/(v, + I) and s,« = v,i/»/(vj + 1) . (8.0.1) 

Now the sample variance, s % , has the probability differential 
(7.11.4) 

n - I 

dp(s') = r exp(-t««/2B*)(j«)Vrf( < i) 
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Substituting from (8.6.1), the probability differential of n ! is 

r,-3 

dp(u*) = [(v,/2o»)'-'' ;! /r(v 1 /2)](M«) 2 exp (- v,»»/2o s ),i(t< s ) 

(8.6.2) 

But M and v are independent ; therefore the joint probability 
differential of i«* and u J is 

X exp [— (v,i«* + v^*)/2o»;rf(n*)</(t; t ) . (8.6.3) 

Now let 

z m lag^u/v) = i lo&K/w 1 ) . . (8.6.6) 

Then u* = i;' exp (2r). and, for a given v, </(«*) = 2v* exp (2z)di. 
Therefore 

>i + >i-=i 

X exp I- (v.** 4- v s )t>"/2o»](i>») 3 d(v*)dz . (8.0.6) 

To find the probability differential of z, we integrate this with 
respect to v* between 0 and 00, obtaining 

m r(v,/2)r(v t /2) x 

f >l + 'f-2 

X exp (v,s)rfs x / exp[- (v,** + v s )y , /2o«](u«) a d(v*) 
H 

Recalling that 

r 1 

T(n) = / x— > exp ( - x)dx, we put x = ^ - t (v.e 1 * + v s )i;«. 
0 
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This defines Fisher's s-distribution, which, it should be noted, 
is independent of the variance of the parent population. 
How do we use it ? 

The probability, P(z ^ Z) , that z will be not greater than some 
given value Z for v,, v, degrees of freedom is / dp{z). Then 

0 

P(z > Z) = 1 - P(z t^Z) = 1 - J dp(z). In his book 

I 

Fisher gives tables setting down the values of Z exceeded with 
probabilities 0-05 and 0-01 for given v, and v„. He calls these 
values, Z„. os and Z 0 . ol , the " 5% and 1% points " of z. 

To obviate the necessity of using logarithms, Snedccor 
(Statistical Methods, Collegiate Press, Inc., Ames, Iowa) tabu- 
lated the 5% and 1% points of the variance ratio, «*/!>*, which 
he denotes by F, in honour of Fisher, instead of z = £ log, F. 
Substituting F — m'/u*, where u 1 is the larger of the two estimates 
of the population variance, or F = exp {2z), in (8.6.7), we have 

' v.'-'V*' 3 *'•-*>'* 
(v,F + v.) i 

In the F-table. Table 8.6, the d.f. v,, of the larger estimate, 
give the column required, and v s , the d.f. of the smaller estimate, 
the row required. At the intersection of the appropriate 
column and row we find two figures : the upper figure is that 
value of F exceeded with a probability of 0-05, the 5% point ; 
the loucr figure is that value exceeded with a probability of 
0-01, the 1% point. 

We may now return to the problem at the beginning of this 
section. 



1 Writing 

x = v,FI{v,F + *,) or F = v^lv, (1 - *) 

we have 

Hence X) - -j—^ | V»-<1 - 

and the integral is Bj(v,/2, v,/2). Thus P(x X) is the Incom- 
plete B-function Ratio, lj(»i/2, v t f2) and can be found from the 
appropriate tables (see Mathematical Note to Chapter Three, D). 
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Table 8.6. 5% and 1 % Points for the Distribution of the 
Variance Ratio, F 



(Adapted, by permission of the author and publishers, from 
Table 10.5.3 of Statistical Methods by G. W. Snedecor (5th Edition, 
1956, pp. 246-24!)). 
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1 


I 


I 


4 


» 


6 


8 


12 


24 


00 


1 1 


101 


200 


210 


225 


230 


234 


239 


24 1 


249 


231 




(DM 


r.i'.iu 


MM 


.•■1125 


3761 


5859 


.VM 


0106 


0234 
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8-47 


8-10 
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0*84 


6-47 
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3*05 


8 


Ml 


4-40 


4-07 


3-R4 


3*69 


3*58 


3*44 


8-28 


3*12 


2-DS 


11-20 


Ml 


7-59 


7*01 


0*63 


6*37 
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6-07 


5*28 


4-86 


S 
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4*26 


3-80 


3*03 


3-48 


3*37 


3*23 


3*07 


2*90 


2*71 




10*36 


8-02 


0*99 


0*12 


0-06 


6*80 


5*47 


611 


4-73 


i*3l 


10 


4 -on 


4-10 


3*71 


3*48 


3*33 


3*22 


3-07 


2*!>1 


2'74 


2*54 


1(1*01 


7*50 


0-35 


6-99 


6*111 


6-39 


300 


4-71 


4*33 


1-31 


11 


4*81 


3-98 


3-59 


3*30 


3*20 


3-09 


2*96 


2*70 


■I'lii 




9*05 


7-20 


0-22 


3-07 


6*33 


6*07 


4-74 


4-40 


4*02 


3-60 


13 


4-75 


3-88 


3-49 


3-26 


3*11 


3*00 


2-83 


2-09 


2* 811 


•>.*in 


Ml 


0-93 


3-93 


51 1 


5*06 


4*82 


4*60 


4-16 


8-78 


.1-30 


13 


4-117 


3-80 


3*41 


3*18 


3*02 


2*92 


2*77 


2-00 


2-42 


8*91 


11-07 


C-70 


5*74 


6*20 


480 


402 


4*30 


3-90 


3-59 


310 


14 


4 -SO 


3-74 


3*84 


3*11 


2*90 


9*85 


2*70 


2-63 


2-33 


213 


v.«6 


r.-M 


6*50 


3*03 


4*69 


4*46 


4*14 


3-80 


3-43 


3-00 


IS 


4-54 


3-08 


3*29 


306 


2*90 


S*79 


2*61 


2-48 


2-29 


1-07 


8-es 


0-30 


3*43 


4-89 


4-30 


4*32 


4 00 


3-67 


3-29 


2-87 


16 


4-4H 


3-03 


3*24 


3-01 


2-85 


2-74 


2*69 


2-42 


9-24 


2-01 


9-63 


0-23 


3*39 


4-77 


4-44 


4-20 


3-89 


3-63 


3-18 


2-76 


17 


4*43 


3-50 


380 


2-96 


3-81 


2-70 


3-65 


2-38 


2-19 


1*90 


8*40 


0-11 


3*18 


4*67 


4-34 


410 


3-79 


3-46 


3-08 


2*65 


IS 


4-41 


3-53 


316 


2*93 


2-77 


2-06 


2-61 


2-34 


5-13 


1-92 


8-28 


601 


5*1)9 


4*38 


4*23 


401 


3-71 


3-37 


3-00 


2-67 


10 


4-38 


3-12 


3*13 


2*90 


2-74 


2-63 


2-48 


2-31 


311 


1-88 


8-18 


8-93 


301 


4*60 


4-17 


.1-9-1 


3 03 


3-30 


2-92 


2*49 



F 
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Table 8.6. Continued 



\»1- 

•V \ 


1 


1 


i 


4 


o 


e 


8 


12 


31 


00 


'.Ml 


4-35 


3-411 


3-10 


2-87 


2-71 


MP 


2-45 


3-38 


2-08 


1-84 




8-10 


6-85 




•1*13 


4-10 


3-87 


3-6U 


3-33 


2-80 


3-42 


SI 


4-32 


3-47 


3-07 


Ml 


208 


2-67 


2-13 


2-25 


2-OS 


1-81 




8-02 


6-78 


i'OI 


4-37 


4-01 


3-81 


3-61 


3-17 


2-80 


8-30 


it 


4-30 


3-44 


MO 


MB 


2-0O 


2-55 


2-40 


2-23 


2-on 


1-78 




7 -VI 


6-72 


4*82 




3-99 


3-70 


3-15 


3-12 


2-75 


2-31 


23 


4-28 


3-43 


303 


2-80 


Ml 


2-53 


2-38 


2-80 


2-00 


1-76 




7-88 


6-00 


J-Ti: 
•1 *<• 


1*90 


3-94 


3-71 


3-41 


3-07 


2-70 


2-30 


24 


4-21) 


3-10 


301 


3-78 


2 02 


2-51 


2-30 


3-1S 


1-98 


HI 




7-82 


1 


4-72 


4*22 




3*(i* 


3-30 


3-03 


2-Oti 




30 


417 


3-32 


2-92 


2-r,n 


2-83 


2-42 


2-27 


Ml 


1-89 


1-03 




7-50 


6-39 


4-51 


4* 02 


3-70 


•»•■!« 




2-81 


3-47 




40 


4-08 


3-S3 


2-84 


2-81 


2-15 


2-34 


2-18 


2-00 


t-n 


1-61 




7-31 


6-18 


4-31 


3-83 


3-61 


3-29 


2H9 


2-08 


2-29 


1-80 


CO 


4-00 


3-16 


3-78 


2-.12 


2-37 


2-25 


2-10 


192 


1-70 


1-39 




7-08 


4-08 


4-13 


Ml 


3-34 


3-12 


2-82 


3-60 


212 


1-60 


ISO 


3-92 


3 117 


2-68 


2-46 


s-sa 


2-17 


202 


1-83 


101 


1-35 




0-85 


4-79 


3115 


3-18 


317 


2-90 


2-60 


2-31 


Hi 


1-38 


so 


3-84 


2-99 


2-00 


2-37 


2-31 


Ml 


1-94 


t-n 


1-52 


Mm 




0-84 


4-60 


3-78 


3-33 


3-02 


2-80 


2-51 


3-18 


bit 


1-00 



Note to Taiilr 8.6 

(1) To find the 6% and 1% points for values of r, or v t not given in the above table 
when r, > 8 and > 24 we proceed as illustrated below : 

(a) To And the 5% point ot when r, - 200, - 18. enter the Table at - 18. 
I he 5% poim for r, 21 is 216; the 5% point for r, — x, is 1-92. Divide 
24/24 - I; divide 34;x - 0; divide 34/300 - 0-18. The difference between the 
two given 5% points is 0-33. 013 of this difference. 0-12 x 0-23 - 0-0270. We 
add this to 1-U3. obtaining 1-95. correct to two decimal places. 

Id) To find the 1% point of F when - 11, - 31, enter the Table at - 31. 
The 1% point when m 8 is 3-61 and when », — 13 is 3-17. 24/8 — 3; 31,12 m 
2; 24/11 - 2-18. The difference between the two known 1% points is 
3-51 -317 - 0-34. u-18 x 0-34 - 0 06. Hence the required 1°„ point is 
3-17 + 0-06 «■■ 3-23. 

(c) To find the 5% point of F when ►, - 4, », - 66, enter the Table at r. = 4. The 
6% point for -• 40 is 2-01 ; the 5% point for r,m 00 is 2-62. 120/40 — S: 
120/60 - 2; 120/66 - 2-18. 2-01 - 3-52 - 0-09. 0-18 x 0 09 - 0 016. The 
required 5% point is 3-62 + 0 010 — 2-64 correct to two decimal places. 

(J) l"o find the 1% point of F when — 12, — 600, enter the Table at »-, — 12. 
The 1% for r, - 120 is 2-34; the 1% point for >.-xls 2-18. 120/130 - 1; 
ISO/so - 0; 130/600 - 0-24. 2-34 - 2-18 - 0-10. 0-24 X 0-10 - 0-O3S. The 
required 1% point is 2-18 + 0-038 - 2-22, correct to two decimal places. 

(2) If wo make the substitution F - <■ in (8.0.8), simultaneously putting », - 1 and 
•>, — »■, we find thai the probability differential of /•' transforms into that for I. Thus 
we may use tho /-'-tables to find the 5% and 1% points of /. They are, in fact the 
square roots of the 6% and 1% points of F for », - 1. (See also 10.3) 



SAMPLE AND POPULATION. II 



We tabulate the working as follows : 



i\cduing oi 
voiLnicier t\ 


/ V 1 - 1 




1-11 
I'll 


(Till 


1 I 1 1 1 1 1 1 1 


1.1-1 




(1-1)1 [•' 


1-14 


004 


00016 


110 






10!) 


-0-01 


00001 


1 -1 1 




O 1 II II) 1 


112 


002 


00004 


1-1-5 


oori 


0-002.-. 


113 


003 


0-0(H)i) 


114 


004 


00016 




0-24 


00008 



* = 110 + 0-24/10 
= I 124 

s* 1-10)»/10 - [X - 110)* 

= 0-000404 
.-. s, = <M»20I 



Reading of 
voltmeter B 
(X). 


(X - 1-10). 


(A' - 110)». 


112 


002 


lj-0004 


106 


-004 


0(H) 16 


1-02 


-008 


00064 


108 


-002 


0-0004 


111 


001 


00001 


105 


-005 


00025 


100 


-004 


00016 


103 


-007 


00040 


105 


-005 


0-0028 


108 


-002 


0 0004 




—0-34 


00208 



X = 1 10 - 0 034 

= 1-066 
s x * - 0 00208 - (0034)« 
= 0000924 
.-. Si = 00304 
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For voltmeter A: \ 1 1 = 0024 x 9* /0-0201 = 3-58. Entering 
Table 8.1 at v m 9 we find that the value of / exceeded with a 
probability of 0 01 is 8-26, The result is therefore significant at the 
1% level. Since the value of / here is positive, the voltmeter A 
definitely reads high. 

For voltmeter li : \l \ = 0 0340 X 9J/0-0304 = 3-36. Once again 
the value of / is significant at the 1% level and we conclude that, 
since t is here negative, the voltmeter reads low. 

To test whether there is evidence that one voltmeter is more 
consistent than the other, we set up the null hypothesis that there 
is no difference in consistency. In other words, we assume that the 
samples are from populations of the same variance. 

F = u'lv 1 , where 11' > v* and >«' and v l arc unbiased estimates of 
the population variance based on, in this case, the same number of 
degrees of freedom. 9. Since the samples are of equal size, we have 
F = (g*/*, 1 = 0 000924/0 000404 - 2-29 . 

Entering Table 8.6 at r, = 9. we read that the f>% point of F for 
r t = 8 is 3-23, while that for t, — 12 is 3 07. 24/8 = 3; 24/12 = 
2: 24/9 = 1-67. 3-23 - 3 07 = 0 16. 0 16 X 0-67 = 0 107. 
Therefore the 5% point of /•' for v, ~ v, = 9 is 3- 177 - 318. correct 
to two decimal places. The value of /•' obtained, 2-29, is, therefore, 
not significant at the 5% level, and we have no reason to reject the 
hypothesis that there is no difference in consistency between the two 
voltmeters. 

EXERCISES ON CHAPTER EIGHT 

1. A sample of 14 eggs of a particular species of wild bird collected 
in a given area is found to have a mean length of 0-89 cm. and a 
standard deviation of 0-154 cm. Is this compatible with the 
hypothesis that the mean length of the eggs of this bird is 0-99 cm. ? 

2. A group of 8 psychology students were tested for their ability 
to remember certain material, and their scores (number of items 
remembered) were as follows : 

A B C D E F G H 
19 14 13 16 19 18 16 17 

They were then given special training purporting to improve 
memory and were retestcd after a month. Scores then : 

A B C D E F G H 
26 20 17 21 23 24 21 18 
A control group of 7 students was tested and retestcd after a month, 
but was given no special training. Scores in two tests : 

21 19 16 22 18 20 19 
21 23 16 24 17 17 16 

Compare the change in the two groups by calculating / and test 
whether there is significant evidence to show the value of the special 
training. Do you consider that the experiment was properly 
designed? (R.S.S.) 
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3. A sample of 6 values from an unknown normal population : 
20. 23. 24. 28. 22. 26. Another sample of 5 values: 21, 24. 27. 
26. 25. Show that there is no good reason to suppose that the 
samples are not from the same population. 

4. Two marksmen, P and Q, on 25 targets each, obtained the 
scores tabulated below. Ascertain whether one marksman may be 
regarded as the more consistent shot. 

Score . . 93 94 95 96 97 98 99 100 Total 
_ t P 2 1 4 0 5 5 2 6 25 

Ircquency{ Q 0 2 2 3 3 8 5 2 25 

(I.A.) 

5. Latter has given the following data for the length in mm.s of 
cuckoo's eggs which were found in nests belonging to the hedge- 
sparrow (A), reed-warbler (B) and wren (C): 

Host. 

A 22 0. 23-9, 20-9. 23-8. 25 0, 24 0, 21-7, 23-8. 22-8. 23 1. 23 1, 

23-5. 23 0, 23 0 
B 23-2. 22 0, 22-2. 21-2. 21-6, 21-6, 21-9. 22 0, 22-9. 22-8 
C 19-8. 22 1, 21-5. 20 !t, 22 0. 21 0, 22-3, 21 0, 20-3. 20-9, 22 0. 
20 0. 20-8, 21-2, 2 1 0 
Is there any evidence from these data that the cuckoo can adapt the 
size ol its egg to the size of the nest o( the host? 

Solutions 

1. Not significant at 0-02 level. 

2. Evidence of improvement in test group highly significant; 
that of control group highly insignificant. Initial scores in control 
group too high for control to be useful. 

4. /•" not significant at 0 05 point, i.e., although there is evidence 
that Q is more consistent this could arise from random variation 
alone. 



CHAPTER NINE 

ANALYSIS OF VARIANCE 

9.1. The Problem Stated. The test of significance of the 
variance-ratio, F, described in the previous chapter encourages 
us to embark on a much wider investigation, that of the analysis 
of variance. This important statistical technique has been 
defined by its originator. Sir EL A. Fisher, as 

" The separation of the variance ascribablc to one group 
of causes from the variance ascribable to other groups 
(Statistical Methods for Research Workers, Eleventh Edition, 
1950, p. 211). 

Suppose that from a large herd of cows we pick fifty animals 
at random and record the milk-yield of each over a given 
period. These fifty amounts (in gallons, say) are a sample of 
fifty values of our variate. Now the herd may consist, perhaps, 
of five different breeds— Ayrshire, Jersey, etc. — and we want 
to find an answer to the following problem, using the evidence 
provided by our sample : 

Does milk-yield vary with the breed of cow ? Or, in 
other words, are milk-yield and breed connected ? 

As a first step towards answering this question, it would be 
reasonable to divide our sample into five sub-samples or 
classes, according to breed. Then if we could split up the 
total variance of our sample into two components — that due 
to variation between the mean milk-yields of different breeds 
and that due to variation of yield within breeds, we could 
subject these two components to further scrutiny. 

To do this we first set up the null hypothesis that the factor 
according to which we classify the population and, therefore, the 
sample values of the variate, has no effect on the value of the variate, 
i.e., in our present case that breed (the factor of classification) 
does not influence milk-yield. If, indeed, this is the case, 
each class into which we divide our sample will itself be a 
random sample from one and the same population. Con- 
sequently any unbiased estimates we may make of the popula- 
tion variance on the basis of these sub-samples should be 
compatible and, on the further assumption that the population 
sampled is normal, these estimates should not differ significantly 
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when subjected to a variance-ratio test. However, should they 
be found to differ significantly, we should have to conclude that 
our sub-samples are not random samples from one homogeneous 
population, but are in fact drawn from several different popula- 
tions brought into being as it were by our method of classifica- 
tion. We should have to conclude, in short, that our null 
hypothesis was untenable and that milk-yield and breed are 
connected. 

In practice, of course, the problem is seldom as simple as 
this and may involve more than one criterion of classification. 
We may, for instance, have to analyse, not merely the influence 
of breed on milk-yield, but also that of different varieties of 
feeding-stuffs. This would present us with a problem of 
analysis of variance with two criteria of classification. Prob- 
lems arising from three or more criteria are also common. 
Although the general principle underlying the treatment of all 
such problems is the same, each presents its own particular 
problems. 

9.2. One Criterion of Classification. Consider a random 
sample of .V values of a given variate x. Let us classify these 
.V values into m classes according to some criterion of classifica- 

Ml 

tion and let the ith class have »j members. Then £ mj ■» -V. 

i - 1 

Also, let the fth member of the ith class be The sample 
values may then be set out as follows : 



Class 1. . 












Class 2. 




*it 




*tl 




■ 
















Class i 


*n 


*tt 




*« 




x,., 

















Class m 


*mX 


*mt 









It is frequently the case that, for all i, ni = n, i.e., each 
class has n members and, consequently, N = mn. 

Let the mean of the ith class be Si. and the general mean of 
the N values be x„ Then, for all i, 

2 ( Xi) - xu) = 0 . . . (9.2.1) 
i-i 
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Consequently, 

2 (*<J - SRS) 1 = £ (*„ - Xi. + i,.- £„)* 

'-' », J-» », 

= £ (x h - ft.)* + n,(X,. - x..)* + 2{xi. - i) £ ( Xi) - Hi.) 
J-> J-l 

»( 

= £ (*,-) — + m(Xi - *_)'. (in virtue of (9.2.1)) 
f — 1 

Hence 

m hi 

£ £ (*,J - *_)« 

■ n , 

= £ m{xi. - *_)« -f £ X (4-, - *,.)* (0.2.2) 
'-1 i-i j- l 

The left-hand member of this equation is the total sum of 
the squared deviations of the sample values of the variate from 
the general mean; it is a measure of the " total variation ". 
The right-hand side of the equation shows that this " total 
variation " may be resolved into two components : 

one, measured by the first term on the right-hand side, is 
the variation which would have resulted had there been 
no variation within the classes; this is easily seen by 
putting xij = *(. for all i ; it is therefore the variation 
between classes; 

the other, measured by the second term on the right-hand 
side, is the residual variation within classes, after the 
variation between classes has been separated out from 
the total variation. 

in short, 

Total variation = Variation between classes 

+ Variation within classes 

Assuming that the classifying factor does not influence 
variate-values, each class into which the sample is divided by 
this factor will be a random sample from the parent population. 
Taking expected values of the terms in (9.2.2) : 

£ I £ £ ( Xij - X..)A = (A' - \)a\ 

\ i— 1 j — 1 / 

where o 1 is the variance of the parent population. Also 

= £ (n, - l)o* = (N - m)a*. 
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Consequently, 

^£ w,(^. - *..)») = (A' - l)a* - (AT - m)o« = (m - l)o« 

Thus, providing our null hypothesis stands, (9.2.2) leads us 
to two unbiased estimates of a 1 , viz., 

£ m(x,. - *..)'/(>'» - 1). 
f-l 

based on m — 1 degrees of freedom, and 

£ £ (xu - x,.)*l(N - m). 
t-l i—\ 

based on N — m degrees of freedom. 

So far all we have said has been true for any population. 
We now impose the restriction thai the population sampled be 
normal. With this restriction, the two unbiased estimates 
arc also independent, and all the conditions for applying 
either Fisher's j-test or Snedecor's version of that test are 
fulfilled. 

We now draw up the following analysis of variance table : 



A nalysis of Variance for One Criterion of Classification 



Source of 
variation. 


Sum of 
squares. 


Degrees of 
freedom. 


Estimate of 
variance. 


Between 
classes 


1 n,(X,. - *_)« 
1-1 


m - 1 


1 «,(X h - *..)'/ 
'-' (wi — 1) 


Within 
classes 


I "i (*„-*,.)« 
1-1 j-i 


N - 111 


£ £ (x„ - *,.)»/ 
<-U-i (X-m) 


Total 


I £ (*„ - *,.)• 

<-l j-l 


N - 1 





Since, from the conditions of the problem, both A' and m are 
greater than 1, the estimate of o* from the variation within 
classes must, of necessity, be based upon more degrees of 
freedom than that of a" from the variation between classes. 

It is reasonable, therefore, to take the estimate of o l from 
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the variation within classes as the more reliable estimate, 
even if the null hypothesis is untenable. If. then, the other 
estimate of o", that from the variation between classes, is 
smaller than this, but not considerably so, we may straight- 
away conclude that there is no evidence upon which to 
reject the hypothesis. If, however, it is greater, we may test 
whether it is significantly greater by means of a variance-ratio 
test. 

9.3. Worked Example. 

Six machines produce steel wire. The following data give the 
diameters at ten positions along the wire for each machine. Ex- 
amine whether the machine means can be regarded as constant. 

Machine. Diameters in thousandths of an inch. 



A 


12 


13 


13 


16 


16 


14 


15 


15 


16 


17 


B 


12 


14 


14 


16 


16 


18 


17 


19 


20 


18 


C 


14 


21 


17 


14 


19 


18 


17 


17 


16 


15 


D 


23 


27 


25 


21 


26 


24 


27 


24 


20 


21 


E 


12 


14 


13 


16 


13 


17 


16 


15 


15 


14 


F 


13 


18 


13 


16 


17 


15 


15 


16 


16 


17 



(I'aradine and Kivctt) 



Treatment : (1) We set up the hypothesis that the machine means 
are constant. 

(2) We note that : 

(a) since the variance of a set of values is independent of the 
origin, a shift of origin does not affect variance-calculations. 
We may therefore choose any convenient origin which will 
reduce the arithmetic involved. We take our new origin at 
* = 20; 

(b) since we are concerned here only with the ratio of two vari- 
ances, any change of scale will not affect the value of this ratio. 
This also contributes to reducing the arithmetic, but in the 
present example is unnecessary. 

(3) We may further reduce the work of calculation as follows : 

Let TmZXttM and T, = 

Then (a) ZZ(x„ - *)• = 22 V - m £E*«» - T»/jV 

(6) ex - *,)« -zfefa - *,)«) = s (ev - JV/m) 

= SSV-2(77/».) 
(«) - = X(T,*I»,) - T*IN < ' 

t 1 

(4) In our example, n, = 10, for all i, and 111 = 6. Therefore 
N = nm = 60. We set up the following table : 
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1 

Machine* 
;(.» - II). 


Positions (n - 10) 


T, 


7V/» 




L 


2. 


8. 


4. 


t. 


8. 


7. 


8. 


B. 


10. 


A 


— 8 
<H 


13 
49 


1:1 
j 

49 


1« 
^ 

10 


Id 

_ ^ 

10 


14 

- U 
30 


15 

— ft 

25 


18 
— 5 
25 


16 
1(1 


17 

— s 

8 


-63 


ISM 


100 


B 


12 
— 8 
04 


14 
— 6 
80 


14 
— 8 

38 


10 
I 

10 


1« 

4 

16 


IS 
_8 
4 


17 
— 3 
9 


IS' 
_1 

1 


•i" 

0 

0 


18 
_ 2 
4 


-36 


129-0 


186 


C 


14 

(1 

M 


21 
1 
1 


17 
— 8 
9 


14 

— 6 
80 


19 
_ 1 
1 


18 
_ -„> 
4 


17 
— 3 
9 


17 
— 8 
9 


16 
_4 
1(1 


16 

— 6 
25 


-32 


102-4 


140 




23 
8 
9 


37 
7 
49 


25 
5 
15 


21 
1 
1 


26 
6 

H 


24 
4 

16 


27 
7 
49 


21 
4 

10 


20 
0 
0 


21 
] 
1 


+38 


144-4 


202 


B 


12 
— 8 
ill 


14 
-8 

tj 


13 
— 7 
49 


10 
—4 
10 


13 
— 7 
49 


17 
-3 
V 


16 
-4 
16 


10 
-5 
25 


16 
-5 
25 


14 

-6 
36 


— 65 


302-3 


325 


F 


IS 
-7 
4U 


18 
-S 
4 


13 
-7 
49 


10 

-4 
10 


17 
-3 
9 


19 
-5 
25 


15 
—5 

25 


10 
—4 
10 


10 
—4 
10 


17 
-3 
9 


-44 


1030 


218 




r*/.V - 182VH0 - 502 07 


T - 

182 


* rttm 

- ll.M-J 


77*»' 

1.182 



The analysis of variance table is, accordingly — 



Source of 
variation. 


Sum of 
squares. 


Degrees of 1 Estimate of 
freedom. variance. 


F. 


Between 
machines 


1.153-4 — 552-1 
= 601-3 


5 


120-3 


28-6 


Within 
machines 

Total 


1.382 0 - 1. 153-4 
= 228-6 


6x9 
= 54 


4-2 




829-8 


59 







Entering Table 8.6 at = 5, v, = 54, we find that this value of 
/•' is significant at the I % point. We, therefore, reject the hypo- 
thesis that there is no difference between the machine-means. In 
other words, there is a significant variation, from machine to machine, 
of the diameter of the wire produced. 

9.4. Two Criteria of Classification. Now let us consider the 
case where there arc two criteria of classification. Suppose, for 
instance, we classify our cows not only according to breed but 
also according to the variety of fodder given them. To examine 
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where milk-yield varies significantly with breed and with diet, 
we must analyse the variation in yield into three components, 
that due to breed, that due to variety of fodder, and, finally, 
the residual variation, due to unspecified or unknown causes, 
assumed to be normal. 

Let our sample be of N values of x, the variate, such that 
A' = nm, and let us classify it according to some factor A into 
Mt classes, and, according to another factor B, into n classes. 
Let the sample variate-value in the ith A-class and jtb B- 
class be X$. The reader should verify, using the method of 
the last section, that 

m I m a 

2 2 (*„ - *..)» = 2 n(xt. - ;?„)» + 2 m{X.< - *„)« + 



+ 22 (x„ - i,. - x.j + *..)» 



(9.4.1) 



The first term on the right-hand side of this equation is the sum 
of the squared deviations from the general mean if all variation 
within the A-classes is eliminated, i.e., if each item in an A-class 
is replaced by the mean value of that class ; the second term 
is the sum of squared deviations from the general mean if all 
variation within B-classes is eliminated ; while the third term, 
the residual term, measures the variation in x remaining after 
the variation due to that between A-classes and that between 
B-classes has been separated out ; once again we assume it to 
be normal and due to unspecified or unknown causes. 

Since it has been shown (A. T. Craig, " On the Difference 
between Two Sample Variates ", National Mathematical 
Magazine, vol. II. (1937), pp. 259-262) that the three terms 
on the right are independent, we have 

(til H . 

2 2 ( Xil - *_)«) = (mn - l)c s ; 



and 



t ( f i m(x.j - *.)«) = (« - l)o»; 



8 ( 2 2 (*(, - x,. - x.j + X..)') 

= {mn - 1)<j» - (m - l)o* - (m - l)o s 
= (m - l)(n - l)o*. 



(10.4.2) 
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Here a s is, of course, the variance of the parent population, 
assumed homogeneous with respect to the factors of classifica- 
tion, A and R. 

The analysis of variance table is, therefore : 



Analysis of Variance for Two Criteria of Classification 



r>ourcc oi 
variation. 


Sum of 
squares. 


Degrees 
of 

freedom. 


Estimate of 
variance. 


DuL Willi \- 
classes 


m 

Zn{t,. - S„)' 


M - 1 


■ 

r nl* - *..)«/(«« - l) 
i-i 


between 13- 
cl asses 


i m(S., - f..)' 
i-i 


n - 1 


1 »(*., - i..)'/(» - l) 


Residual 
(A x B) 


£ 2(A- y -i,. 
i-ij-i 

- *i + *.)* 


(»> - 1) 
(» - 1) 


2 S {^j-M^-SU, 
' + !)(..-!) 


Total 

i 


1* N 

i-u-i 


mn — 1 





Let us call the three resulting estimates of a*, Q.i, Qn and 
Q.i x ii respectively. The test procedure is, then, as follows : 

(a) Test QaIQa x ii for m — 1 and (m — l)(n — 1) degrees of 
freedom using the F-table; and 

(b) test QhIQa x ii for n — 1 and (w — l)(ti — 1) degrees of 
freedom using the /"-table. 

9.5. Three Criteria of Classification. We now assume that we 
have JV = Imn sample values of a given normal variate, 
classified according to three criteria, A, B, C. into / groups of 
M rows and n columns. Let be the value of the variate 
in the ;th row and /(th column ol the ith group (see Table 9.5). 
i takes the values 1, 2, 3, . . . /; j, the values 1, 2, 3, . . . m ; 
and Ii, the values 1, 2, 3, . . . n. 

We shall use the following notation : 
= general mean ; 

x/j. = ( 2 XijA /n = mean of values in ith group, jth row 
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Xj. t = ^ 2 Xijtj fm — „ „ „ Jthgroup.Athcolumn; 
S.jt = ^ 2 xijt^ // = ., ,. „ yth row, kth column ; 

(m a « 
2 2 /win = mean of t'th group; 
J — 1 1 — i / 

S.j. = ( 2 2 *M»1 //n = mean of jth row ; 

i i a % 

( 2 2 *,•».) //wi = mean of ftth column. 



x..t = 



Table 9.5 will make this notation clear. 
The identity upon which the analysis of variance table is 
based is : 

I m ii i 

2X2 (Xfet - *_)' = win 2 (Jfc. - *...)' 

m n 

+ U I (.f.j. - f...)' + /»« 2 (.U - *...)» 
j-i i--i 

m n 

+ l 2 2 (&» - % - x.. t + i..) 1 
j-i t-i 

i n 

-f m 2 2 (*,.* — *». — &t + 
(-1 *-i 

+ « 2 2 - - 8+ + x...)' 

+ 2 2 2 (xijt - x.jt - *U ~ + + % + - 

(9.5.1) 

The resulting analysis of variance table is given on page 175. 

9.6. The Meaning of " Interaction ". In the analysis of 
variance for three criteria of classification, we have spoken of 
" interactions " between the factors A, B and C. What do we 
mean by this term ? 

Our null hypothesis is that no one of the factors A, B, C 
separately influences the variate-values. But it is conceivable 
that any two of these factors acting together may do so. 
When, therefore, we have three or more criteria of classifica- 
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tion, it is necessary to test the estimates of a 1 based on the 
" interaction " terms before testing the variations due to the 
individual factors separately. If none of these ' ' interactions ", 



Table 9.5. Sample for Three Criteria of Classification 







Columns (Factor C) k 
1 .... A . . 


= I to H 


Means~ 


1 

o • 


52 • 

rs 

7l 1 

S-L : 

VI 


*m • 
*Ui • 

X\ml 


• *IU • 

• • x„ t . 

■ X) m t . 


• • *Un 

■ x„. 
■ • *tmm 


*u 

*,f. 
*l~ 


II ' 


Means 


• 


. . . 


■ ■ Sum 


*1. 


:tor A) i 


; 

~~ ; 


'ill 

*l|. • 


• *IU . 

• . ' 


. . x a . 
• ■ x u- 


. 


3 

fa ■ 


X£ VI 


'Iml 






*lm. 


Groups 


Mkans 


*M • 


■ ■ '» ■ 


■ ■ 




B! 1 

52 • 


'Ill 


■ ■ Xnt . 




Jfa. 


/ 


ft': 

— ~ hi 


Xl)\ 

Xlm\ ■ 


• • *l» ■ 

■ • • 


• ■ **, 

■ ■ *lm. 


S lm- 




Means 


*» ■ 


• ■ *H ■ 


■ ■ ** 


*,. 




1 

0 - . 


• 


. . *.,* . 


■ ■ **, 


Sib 




#> 


*H . 


i : "* : 


• • 4 


*•!■ 




in 






• ; *« 


*.~ 




Means of 
Columns 




• - x..„ . 




s.. 



when tested against the residual estimate of o», is significant, 
we may pool the corresponding sums of squares, form a revised 
estimate of a*, based on more degrees of freedom, and then 
proceed to test the estimates due to the individual factors ; if, 
however, one, or more, of the estimates due to interactions is 
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found to be significant, we have reason for suspecting the null 
hypothesis. 

Significant interactions, however, do not necessarily 
imply real interactions between the factors concerned. 

The working scientist and technologist are only too aware that, 
from time to time, experiment may give rise to what, at first 
sight, would appear to be a sensational result, but that closer 
scrutiny may well reveal that it is due to chance heterogeneity 
in the data and is no sensation after all. This is just another 
reason why the closest co-operation between statistician and 
technologist is always essential, but never more so than when 
the statistician finds something apparently of " significance ". 

9.7. Worked Examples. 

I . Five doctors each test five treatments for a certain disease, and 
observe the number of days each patient takes to recover. Tlte 
results are as follows (recovery time in days) : 



Doctor. 






Treatment. 








1. 


2, 


3. 


4. 


S. 


A 


10 


14 


23 


19 


20 


B 


11 


18 


24 


17 


21 


C 


0 


12 


20 


16 


10 


D 


8 


1.1 


17 


17 


20 


E 


12 


15 


111 


15 


22 



Discuss the difference between : (a) doctors, and (b) treatments. 

(A.I.S.) 

Treatment : This is a problem involving two criteria of classi- 
fication, each criterion grouping the data into five classes. Thus Mj 
(the number o( items In row i) = n, (the number of items in column 
i) = 5, and .V = 25. 

Transfer to a new origin at 10. 

With the suffix t denoting row and the suffix ; denoting column, 
let 

/• = Zf*i>: T, = S* 0 : T, = S*fci 

then (cf. 9.3 Treatment (3)) : 

SEte. - = S2*. 1 - Ni* - 22JV - T*IN; 
i J i J il 

£S(*- .«)' = - *)• = spyft) - r»/JV; 
I I 1 ' 
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We now draw up the following table : 





Treatment. 
1. 2. 3. 4. 6. 


T. 


T.« 
* 1 • 


*(» -1 


2 , 

y 
O 
- 


r A 

B 

C 
D 

.E 


-6(30) -2(4) 7(49) 2(4) 4(16) 

— .)\ZO) —HI) O(04J 1(1) • >(-'. 1) 

-7(40) -4(16) 4(16) 0(0) 3(9) 
-8(64) -3(9) 1(1) 1(1) 4(16) 
-4(16) -1(1) 3(9) -1(1) 6(36) 


5 
0 
-4 
-5 
3 


25 

a t 
O-t 

16 

25 
9 


109 

1 1 it 
1 1 11 

90 

91 

63 


T, 


—30 -11 23 3 22 


T = 7 


ST,« 
= 139 


469 




900 121 529 9 484 


= 2043 


/ 


V 


190 31 139 7 102 


469 



Consequently, 

(i) Total sum of squared deviations, — 7**/.V = 

469 - 7«/25 - 469 - 1-96 = 467-04. ' > 

(ii) Sum of squares for treatments, £ (T.'/n.) — T*IN = 

2043/5 - I 90 = 406-64. I 

(iii) Sum of squares for doctors. £ (T,'/",) - T*IN = 130/5 - I'M 

= 25 84. ' 

(iv) Residual sum of squares m 467-04 — 406-04 — 25-84 = 34-50. 

The analysis of variance is, then : 



Source of 
variation. 


Sum of 
squares. 


Degrees 
of 

freedom. 


Estimate 
of 

variance. 


F. 


Between treat- 
ments . 


400-04 


4 


101-60 


4700" 


Between doctors 


25-84 


4 


6-40 


2-99 


Residual . 
Total . 


34-50 


16 


216 




467-04 


24 







Entering Table 8.6 at v. = 4, v, = 16. we find the 5% and 1% 
points of /•" to be 3 01 and 4-77 respectively. We conclude, there- 
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fore, that the difference between doctors is hardly significant (at the 
.">"„ level), while that between treatments is highly so (highly significant 
at the 1% level). 

2. P. R. Rider (An Introduction to Modern Statistical 
Methods, John Wiley, New York) quotes the following Western 
Electric Co. data on porosity readings of 3 lots of condenser paper. 
Tlurc are 3 readings on each of 9 rolls front each lot. 



Porosity Readings on Condenser Paper 



Lot 
number. 


Reading 
number. 


1. 


2. 


3. 


Roll number. 
4. 5. 0. 


7. 


8. 


9. 


I 


1 


1-5 


1-5 


2-7 


8-0 


3-4 


21 


2(1 


3-0 


51 




2 


1-7 


1-0 


1-9 


2-4 


5-6 


4-1 


2-5 


20 


5-0 




» 


1-6 


1-7 


20 


2-6 


5-6 


40 


2-8 


1-0 


40 


IT 


1 


111 


2-3 


1-8 


1-9 


20 


30 


2-4 


1-7 


2-0 




2 


1-5 


2-4 


2-9 


3-5 


1-9 


20 


2-0 


1-5 


4-3 




3 


2-1 


2-4 


4-7 


2-8 


21 


3-5 


21 


20 


2-4 


III 


1 


2-5 


3-2 


1-4 


7-8 


3-2 


1-9 


20 


11 


2 1 




2 


2-9 


5-5 


1-5 


5-2 


2-5 


2-2 


2-4 


14 


2-5 




3 


3-3 


71 


3-4 


50 


40 


3-1 


3-7 


41 


1-9 



We shall carry out the appropriate analysis of variance 
assuming, for the time being, that we have here three 
criteria of classification — the roll number dividing the data 
into nine classes, the lot number and the reading number 
each dividing the data into three classes. This will illus- 
trate the method employed in such a case. Then, less 
artificially, we shall regard the data as classified by two 
criteria (roll and lot) with three values of the variate. 
instead of one, given for each Lot x Roll. 

First Treatment : (1) We draw up the table as shown at the top 
of page 180. 
Thus 

ESS*,,,' = (1-5* + 1-6* + 2-7« + . . . + 3-7' + 41* + l-9«) 
' 1 * = 812.41 

and T = 2311, N = 3x3x9 = 81 give fr/N = 65935 

The total sum of square deviations from the mean. 

ESp 0 ,« - T*IN = 812-41 - 659-35 = 153 06 . 
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Lot. 


Read- 
ing. 


Roll 

1. 2. S. 4. S. 0. ;. 8. t. 


Tota ls. 




I 


1 
3 


1-5 1-i 2-7 .1-0 3-4 2-1 2-1) 3-1) S-l 
1-7 1(1 2-1 6-0 41 2-8 2-11 Ii-0 

M M 2-ii 2 « M M 2-s M M 


24-.H 
20-8 
26-8 




Total* 


4-S l-I 0-6 8 0 14 0 10-8 7-3 0-9 11-1 


77-9 


Total (Lot I) 


II 


1 

1 

3 


1- 1> 2-3 1-8 1-9 S-0 3-0 2-1 1-7 2-6 
14 W 2-9 3-s 1-9 20 2-ii is 4*1 

2- 1 2-4 -1-7 2-8 2-1 3-5 2-1 2-U 2-4 


190 
221! 
241 




Totals 


8-6 7-1 l)-l 8-S 0-0 9-1 0-6 0-2 9-3 


0C-3 Total (Lot II) 


III 


1 

a 
i 


2-8 3-2 1-4 7-8 3-2 1-9 8-U M 2-1 
2-9 ft-. - . 1-S .1-2 2-6 2-2 2-4 1-4 2-8 
S-S 7-1 8-4 ft-0 4-0 3-1 2-7 11 1-9 


26-2 
20-1 
36 0 




Totai-s 


8-7 18-8 0-3 18-0 9-7 7-2 8-1 00 6-8 


80-9 | Total (I-or 111) 


Total 
(Rolls) 


19-0 27-7 22-3 34-2 30-3 37 1 21-9 18-7 29-9 


231-1 | Grand Total 


Total 1 
(Read- 31 
ings) :; 


24-3 + 190 + 1M 
20-8 4- 22-0 + 20-1 
20-8 + 24-1 + 3S-B 


09-1 l 
78-4 y 2311 
80-5 J 



(2) We draw up the following lot x roll table : 



Lot 








Roll. 










TnrAL 




J. 


2. 


■1. 


1. 


8. 


«. 


7. 


8. 


9. 


(Lots). 


1 


4-8 


4-8 


0-0 


8-0 


14 II 


in-s 


7-3 


M 


141 


77-9 


II 


a-8 


7-1 


9-4 


8-2 


III) 


9-1 


Oil 


6-2 


9-3 


GO- 3 


III 


8-7 


1S-8 


0-3 


1S-0 


D-7 


7-2 


81 


0-0 


fl-ft 


33-0 


Total (Rolls) 


19-0 


27-7 


22-3 


34-2 


30-3 


271 


21-9 


18-7 


29-9 





The sum ol squares for lot x roll classification (i.e.. a two 
criteria classification) is thus (4-8* + 4-8* + 6-6 1 + . . . + 8-1* + 
6-6- + 6-5«) = 2 .280-37 , and the sum of the squared deviations 
from the mean is 2.280-37/3 - 659-35 = 100-77. Mole that we 
divide 2.280-37 by 3 because each entry in the body of this lot x roll 
table is the sum of three readings. 

The sum of squared deviations for rolls is (19-0' 27-7* -J- 
+ 18-7* + 29-9')/!) - 659-35 - 20-31 (why do we here divide by 
9 ?). while the sum of squared deviations for lots is (77-9* -f 66-3 s + 
35 0 , )/27 — 659-35 — 7-90 (why do we, this time, divide by 27?). 
Finally, the residual sum of squared deviations, now called Inter- 
action (Lot x Noll) is found by subtracting from the total sum of 
squared deviations for this classification the sum of that for Kolls 
and that for Lots, i.e., 100-77 — 26-31 — 7-90 = 66-56. 
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(3) The reading x roll table is : 



Reading 


1. 


f. 


8. 


4 


Roll. 
6. 


8. 


T. 


8. 


0. 


Total 
(Read- 
ing). 


i 






.V.l 


12-7 


S-0 


7-0 


01 


6-8 


9-8 


09-1 




0-1 




0-3 


11-1 


IO0 


8-9 


0-9 


4-9 


11-8 


7.VS 


1 


7-o 


lil' 


101 


In-4 


117 


U4 


8-0 


8-0 


8-3 


80S 


Total (Rolls) 


19-0 


-7-7 


22-3 


:m-2 


30-3 


27-1 


2 1 


1S-7 


29-9 





Here the sum of squares, 5-9* + 7 0* + 5-9 1 + . . . + 8-6» -f- 
80 s + 8-3» = 2,107-73. The sum of squared deviations from the 
mean is. then, 2.107-73/3 — 659-35 = 43-23 . The sum of squared 
deviations for readings is (69-l» + 75-5' + 86-5»)/27 — 659-35 = 
5-73. We have the corresponding sum for rolls, already : 26-31. 
Interaction (Nolls x Heading) is. then. 43-23 - 5-73 — 26-31 = 10-19. 

(4) The lot x reading table is : 



Lot. 




Heading. 




1. 


o. 


3. 


1 


24-3 


26-8 


Mi 


II 


19 6 


22-6 


24- 1 


III 


25-2 


26- 1 


35-6 



The sum of squares in this case is 24-3- + 2(1-8' + . . . + 26 P + 
35-6* = 6,086-25. Hence the sum of squared deviations for lot x 
reading is 6,086-25/9 — 659-35 = 16-90. We already have the 
sums of squared deviations for lots (= 7-90) and that for 
readings (= 5-73). Consequently. Interaction (Lot x Reading) is 
16-90 - 7-90 - 6-73 = 3-27. 

(5) The analysis of variance table is shown on the top of page t8s. 

We see at once that the Interactions, lots x readings and 
readings x rolls are not significant; nor. for that matter, are 
they significantly small. However, Interaction rolls x lots is 
significant at the 1% level and so is the variation between rolls, 
while that between lots is significant at the 5% level. 

Since the two Interactions, lots x readings and readings x 
rolls are not significant, we may combine the corresponding sums 
of squares with that for residual to obtain a more accurate estimate 
of the assumed population variance; this is 

3-27 + 10-19 + 33 10 „_ 
4+16 + 32 ~ °' 8U - 

We find, as the reader will confirm for himself, that the levels of 
significance are unaltered when this new estimate of o* is used, and 
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Source of 
variation. 


Sum of 
squares. 


Degrees 
of 

freedom. 


Estimate 
of 

variance. 


F. 


Between rolls . 


20-31 


8 


3-20 


314 •* 


Between lots 


7 •!>(> 


2 


3-95 


3-81 * 


Between readings . 5*73 


2 


287 


2-76 


Interaction (Bolls x 
Lots) . 


66-50 


16 


4-16 


4-00 *• 


Interaction (Lots y, 
Readings) 


3-27 


4 


0-82 





Interaction (Headings 
X Rolls) 


10-19 


16 


0(10 




Residual . 


3310 


32 


104 




Total 


15306 


80 








we conclude that the variation of rolls within lots and that between 
rolls are highly significant, while that between lots is significant. 

Second Treatment : The conclusion we have reached justifies the 
view we suggested originally that it is less artificial to regard the data 
as classified by two criteria only (sou. and lot), with three values of 
the variate, instead of one, being taken. This being the case, the 
situation is summarised by the table given in Step 1 1 of our first 
treatment of the problem. The corresponding analysis of variance is: 



Source of 
variation. 


Sum of 
squares. 


Degrees 
of 

freedom. 


Estimate 

of 
variance. 


F. 


Between rolls . 


26-31 


8 


3-211 


3-39 •* 


Between lots 


7-90 


2 


305 


4 07 • 


Interaction (Rolls x 
Lots) . 


oo-.-.o 


16 


4-16 


4-29 •• 


Residual . 


52-20 


54 


0-97 




Total 


153-06 


80 







Quite clearly, our null hypothesis, of the homogeneity of the 
condenser paper with respect to these two factors of classification, 
lots and rolls, breaks down. 



analysis of variance 



t83 



9.8. Latin Squares. When, in the case of three-factor classi- 
fication, each criterion results in the same number of classes, 
M, say, some simplification may be effected in our analysis 
of variance by the use of an arrangement known as a Latin 
Square. Essentially, this device aims at isolating the separate 
variations due to simultaneously operating causal factors. 

Let us suppose that we wish to investigate the yield per acre 
of five variates of a certain crop, when subjected to treatment by 
two types of fertiliser, each of five different strengths. We divide 
the plot of land used for the experiment into 5 s sub-plots, con- 
sidered to be schematically arranged in five parallel rows and 
five parallel columns. Each of the five columns is treated 
with one of the five strengths of one of the fertilisers (call it 
Fertiliser A) ; each of the rows is likewise treated with one of 
the five strengths of the second fertiliser (B). Then the five 
varieties of the crop under investigation are sown at random 
in the sub-plots, but in such a way that any one variety occurs 
but once in any row or column. Denoting the crop-varieties 
by A, B, C, D, E, we shall have some such arrangement as : 

Fertiliser A 
Strengths 1 2 3 4 8 

A B C D E 

E C A B D 

B D E C A 

D E B A C 

C A D E B 

Now assume that the following figures (fictitious) for yield 
per acre are obtained : 



A 


3-2 


B 


20 


C 


2-2 


D 


1-8 


E 


1-8 


E 


20 


C 


1-8 


A 


2-8 


B 


2-4 


D 


2-2 


B 


30 


D 


1-6 


E 


10 


C 


2-0 


A 


8-6 


D 


2-4 


E 


1-2 


B 


2-6 


A 


20 


C 


2-4 


C 


2-6 


A 


2-2 


D 


20 


E 


1-4 


B 


2-8 



IS 

is 
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Transferring our origin to 2-2 and multiplying each entry by 
5. we have : 







hmOn* a 


Totals (B) 


Totals 
(Varieties). 




Sm-nRlh 


1 


S 


1 


4 


6 


1 


j 


w 


A 


17 


289 




1 


A 

6(26] 


B 
- 1(1) 


c 

0(0) 


D 
- 2(4) 


E 
-2(4) 


0 


34 


0 


B 


0 


81 


a 


S 


i-: 

-1(1) 


C 

-2(4) 


A 
S(9) 


B 

1(1) 


D 
0(0) 


1 


16 


1 


C 


0 


I) 


| 
E 


3 


11 

4(l«) 


D 

—1(9) 


B 
-3(9) 


C 
-KD 


A 
7(49) 


4 


84 


16 


D 


-6 


26 


■X 


•1 


D 
KD 


E 

-6(26) 


B 
2(1) 


A 
2(4) 


c 

1(1) 


1 


36 


1 


E 


-16 


225 




B 


C 
2(4) 


A 

0(0) 


D 
-1(1) 


E 

— '(18) 


B 

3(0) 


0 


30 


0 




fl 


02(1 






11 


-11 


1 


-4 


B 


6 


108 


18 








< 
1 


*** 


47 


n 


23 


20 


at 


1IIS 














w 


121 


121 


1 


in 


81 


HO 













With T = 6 and N = n* = 25, T*Rt = F44. The sum total 
of squares is 198 and so the total sum of squared deviations is 
196-56. The sum of squared deviations for Fertiliser A is 
340/5 — 1-44 = 66-56 . The sum of squared deviations for 
Fertiliser B is 18/5 — 1-44 = 216, and the sum of squared 
deviations for Varieties is 620/5~^~l-44 = 122-56. 

The analysis of variance table is then as shown on page 185. 

Both the variation between varieties and that due to the 
different strengths of Fertiliser A are significant at the 1% 
level. 

By pooling the sum of squares for Fertiliser B and the 
Residual sum of squares, we may obtain a more accurate 
estimate of the assumed population variance : (2- 16 + 5-28)/ 
(4 + 12) = 0-465 . This is the estimated variance of the 
yield of a single sub-plot. The estimated variance of the 
mean of five such sub-plots is, therefore, 0-465/5 = 0093. 
That of the difference of the means of any two independent 
samples of 5 sub-plots is 2 x 0 093 = 0 186. The Standard 
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Source of 
variation. 


Sum of 
squares. 


Degrees 

of 
fn*<*iloni 


Kstimate 


F. 


Between varieties 


122-56 


* 


30 04 


0!>-6" 


Hetween strengths of 

1 LrmiM.1 n. • • 


6606 


4 


111-64 


378 " 


Between strengths of 
Fertiliser B . 


2-16 


4 


0-54 


1-2 


Hesidual . 


5-28 


12 


0-44 




Total 


196-56 


24 







Error of the difference of the means of two such samples is, 
consequently, (0-186)1 = 0-43. For 16 degrees of freedom the 
least difference. ><», between the means of any two samples of 
5 that is significant at the 5% level is given by : m/0-43 -- 212 
or w = 0-9116. 

It will be seen therefore that all the five varieties differ 
significantly, that only strengths 1 and 5 of Fertiliser A do not 
differ significantly, while none of the strengths of Fertiliser B 
differ significantly at the 5% level. 

9.9. Making Latin Squares. If a Latin Square has n rows 
and n columns it is said to be a square of order 11. The number 
of possible squares of order M increases very rapidly with n. 
There are, for instance, 576 squares of order 4; 161,280 of 
orders; 37.3,248,000 of order 6; and 61,428.210.278 of order 7. 
(See R. A. Fisher and F. Yates. Statistical Tables for Use in 
lliological. Agricultural and Medical Research.) 

When the letters of the first row and first column are in correct 
alphabetical order, the square is a standard square. Thus 

ABC 
B C A 
CAB 

is a standard square of order three — indeed, it is the only 
standard square of that order, as the reader will easily realise. 
The standard squares of order 4 are : 

ABCD A BCD ABCD ABCD 

BADC BDAC BCDA BADC 

CDBA CADB CDAB CDAB 

DCAB DCBA DABC DCBA 
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From these standard squares all the remaining, essentially 
different, non-standard squares may be derived. It is import- 
ant to understand, however, that what is required when 
deriving such non-standard squares is a new pattern or lay-out. 
Merely interchanging letters does not suffice. For example, 
the two squares 

A B C D and D C B A 

BADC CDAB 

CD BA BACD 

DCAB ABDC 

present the same pattern and are, therefore, no different. If. 
then, we require to derive a non-standard square from a given 
standard square, we must permute all columns and all rows 
except the first. We thereby obtain a total of 12 possible 
squares of order 3 (12 = 3! x 2!) and 576 of order 4 (in this 
case each standard square, of which there are four, yields 4 ! 
different column arrangements and 3 ! different row arrange- 
ments : 4 X 4! X 3! = 576). 

When all the standard squares of a given order have been 
set down, we say that the standard squares have been 
enumerated. This has been done for n less than 8, although 
for »^8a considerable number have been listed. To choose 
a Latin square of given order, then, we select a standard square 
at random from those enumerated, and permute at random, 
using, both for selection and permutation, a table of random 
numbers. 

EXERCISES ON CHAPTER NINE 



following 


results 


were 


obtained 


in 


four 


(1) 6 


14 


12 


6 


2 


5 


(2) 10 


17 


6 


10 


10 


16 


(3) 11 


11 


10 


23 


8 


17 


W 19 


2 


20 


16 


14 


20 



Carry out an analysis of variance on these data. (L.U.) 

2. Four breeds of cattle B„ B„ B 3 , B, were fed on three different 
rations, K„ R„ R,. Gains in weight in pounds over a given period 
were recorded. 

B, B, B, B 4 

R, 46-5 62 41 45 

R, . 47-5 41-5 22 31-5 

R, . . 50 40 25-5 28-5 

Is there a significant difference : (a) between breeds ; (6) between 
rations ? 



ANALYSIS OF VARIANCE 187 

3. Passenger Traffic Receipts of Main-Line Railwavs and L.P.T.B. 
(Weekly averages. /000) 





First 


Second 


Third 


Fourth 




Quarter. 


Quarter. 


Quarter. 


Quarter. 


1044 


3.376 


3.548 


4.120 


3,836 


1045 


3.320 


4.072 


4.808 


3.872 


1946 


3,310 


3.884 


4.758 


3,611 


1047 


2,001 


3.752 


4,556 


3.703 



Carry out an analysis of variance on these data. Is the between- 
years difference significant ? 

4. A chemical purification process is carried out in a particular 
plant with four solvents (i ■ I, 2, 3, 4) at three different, equidistant 
temperatures. For every one of the 4 x 3 = 12 combinations of 
solvents with temperatures the process is repeated four times and 
the resulting 48 test measurements are shown below (a low value 
indicates a high degree of purity). 

Solvent 



f / = 1 



t" - I 

660 68-3 
08-6 700 



i = 2 
71-2 70-3 
71-8 71-8 



i = 3 
66-2 64-6 
70 1 60-0 



. = 4 
791 666 
66-2 7M 



I - 3 



63-4 
67-2 

66-4 
66-2 



630 
71-2 

64-1 
670 



70-7 
600 

67-5 
6411 



60-0 
60-3 

62-7 
62-4 



04-9 

69-5 

71-6 
73-6 



()L'-7 

66-9 

70-8 
70-4 



li.V'.l 

66-2 

680 
7(1-5 



64-0 
720 
68-8 
72-8 



Carry out the appropriate analysis of variance. Are there differences 
between solvents and temperatures, taken as a whole ? Is there 
any interaction between solvents and temperatures ? (L.U.) 

5. The atmosphere in 4 different districts of a large town was 
sampled, the samples being taken at 4 different heights. Four 
different tests for the presence of a certain chemical were made on the 
samples. The arrangement is shown in the following table with the 
% by weight of the chemical as determined by the tests. Letters 
denote the different tests. 

Districts 
12 3 4 



2 



1 


A 


B 


C 


D 




8 


5-3 


41 


5 


2 


D 


A 


B 


C 




6-8 


40 


41 


3-2 


3 


B 


C 


U 


A 




6-3 


4-7 


4-0 


5 


4 


C 


D 


A 


B 




5-7 


3-3 


40 


4-2 
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Is there evidence of significant variation from district to district 
and between heights in the percentage of the chemical present in the 
atmosphere ? 

Can it be said that there is a decided difference between the 
sensitivity of the tests ? 

SotuHom 

1. Variation between samples significant at 5% point but not at 
1% point. 

2. No significant difference between breeds or between rations. 

3. No significant variation between years but that between 
quarters is highly significant. 

4. Variation between temperatures is not quite significant at 5% 
point: that between solvents is significant at that point: interaction 
between solvents and temperatures significant at 1% point. 

5. Variation between districts significant at 5% point but not at 
1% point; no significant variation between heights or between 
sensitivity of tests. 
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TESTING REGRESSION AND CORRELATION 

10.1. The Correlation Coefficient Again. We now return to 
the problems raised at the beginning of Chapter Seven : 

How do wc know that B value of r, the sample correlation 
coefficient, calculated from a sample of N from a bivariate 
normal population is really significant ? Is there a way 
of deciding whether such a value of r could have arisen by 
chance as a result of random sampling from an uncor- 
rclatcd parent population ? 

Linked closely with this problem are several others : 

If a sample of N yields a value of r = r 0 , how can we 
test whether it can have been drawn from a population 
known to have a given p ? 

Again, how shall we test whether two values of r obtained 
from different samples are consistent with the hypothesis 
of random sampling from a common parent population ? 

Finally, given a number of independent estimates of a 
population correlation coefficient, how may we combine 
them to obtain an improved estimate ? 

We start to tackle these problems by what may at first 
appear to be a rather indirect method. For we shall use the 
technique of analysis of variance (or, more correctly, in this 
case, analysis of covariance) to test the significance of a linear 
regression coefficient calculated from a sample drawn from a 
bivariate normal population. But this, as we shall soon see, 
is equivalent to testing the significance of a value of r, or, what 
is the same thing, to testing the hypothesis that in the parent 
population, p = 0. 

10.2. Testing a Regression Coefficient. Let (.<■;, y ( ), [i = 1, 
2, . . . .V), be a sample of .V pairs from what we assume 
to be an uncorrelated bivariate normal population. Taking 
the sample mean as origin (i = 0 = j>), the regression equation 
of y on x is 

y = b sr x. where b >c = s^/s** . (10.2.1) 

Let Yi be the value of the ordinate at x = xi on this 
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regression line. Then the sum of the squared deviations of 

.v 

the v's from the sample mean ( y — 0) is simply 2 y?. Hut 

1—1 

2 yf = 2 [yt -Yi+ Yi) % 

= z (y>- v ( )« + 2S(y,- v,)^ + s vv 

I— I (-1 <-l 

However, 

2 (yi - y,)y, - 2 'Ok - MtiMi 

/-I <-l / <v .V . 

= V 2 *,y, - b„ 2 *,«) = 0, 
\<-t f-i ' 

A X 

since 6« = s^Mt' = 2 *y</ S 

i-i i — i 
a jr .v 

Hence 2 y,» = S (y ( - y,)» + 2 V ( *. . (10.2.2) 

I— 1 1-1 |aa| 

Thus: 

The sum of squared deviations of observed values of y 
from the sample mean = sum of squared deviations of 
• observed values of y from the regression line of y on x 
+ sum of squared deviations of the corresponding points 
on the regression line from the sample mean. 

or 

Variation about mean — Variation about regression line 

+ Variation of regression line about mean 
Now (10.2.2) may be re- written : 

2 yi' - S [yt - h*d* + V S *(* 

I - 1 (-1 f— I 

= 2 jfr - 2b, r 2 x iyi + 6 yx » 2 x,* + V* S .v,» 

i-l l - 1 i- 1 1-1 

= AV(1 - Sx,*/SxV) + Nsf Wis** . %») 

A" 

i.e., 3B y<* = iVs y «(l -»•*) + AV* . (10.2.3.) 

now seen as an obvious algebraic identity. 

Thus the sample variation of y about the regression line of 
y on x is measured by A's y *(l — »■*), while the variation of 
the regression about the sample mean is measured by Nsfr*. 



testing regression and correlation igi 

x 

To the term 2 yf there correspond A' — I degrees of 

i- 1 

freedom, since the y's are subject only to the restriction that 
their mean is given (in our treatment, y = 0) ; corresponding 

to the sum 2 (y,- — V,)* there is one additional restraint — 

Iml 

that the regression coefficient of y on x shall be b yl . Thus 
corresponding to A's y *(l — r') we have N — 2 degrees of 

freedom. Consequently 2 Yt* = N$ft* has but one degree 
of freedom. ' _I 

Now suppose that the parent population is uncorrelated 
(p = 0) ; then the sample variation of the regression line about 
the mean and the random variation of the y's about the 
regression line should yield estimates of the corresponding 
population parameter not significantly different. If, on the 
other hand, the regression coefficient is significant, i.e., if there 
is in fact an association between the variates in the population 

of the kind indicated by the regression equation, the estimate 

t 

provided by 2 W/l — A's/r* should be significantly greater 
i — i 

than that provided by 

2 (yt - Y<)*I(N - 2) m Nsf[l - r*)l(N - 2). 
i — i 

We may therefore set out an analysis of covariance table as 
follows : 



Source of 
variation. 


Sum of 
squares. 


Degrees of 
freedom. 


Mean 
square. 


Of regression line about 
mean 


Ns f *r* 


1 


:Vs„V« 


Residual (of variate 
about regression line) 


Ns*{l - r«) 


N - 2 


A'5,»0 - r')l 
(A - 2) 


Total 


Ws/ 


N - 1 





If then the sample data does indicate a significant associa- 
tion of the variates in the form suggested by the regression 
equation, the value of 2 given by 

'-i^t^-toli^ (.0.2.4, 
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will be significant at least at the 5% point. Alternatively, the 
value of 

F = r*(N - 2)1(1 - r l ) . . (10.2.4 (a)) 

can be tested using Table 8.0. 

A significant value of either z or F requires the rejection of 
the null hypothesis that, in the parent population, p = 0. 
Thus when we lest the significance of a regression coefficient we 
are also testing the significance of r, the correlation coefficient. 

If neither z- nor F- tables are available, we may use /-tables, 
for, as we shall now show, r[(N — 2)/(l — r 1 )]! is actually 
distributed like / with A' — 2 degrees of freedom. 

10.3. Relation between the t- and 2-distributions. We have 
(8.6.7) 

j^, , totJhl**fd > exp (v.?) . 

*« = *<:,/!. • ; — f *4j5 • * 

(v l exp2j-v,) < 
Now put z — Jlog^ 1 , v, = 1 and v. = v. Then 

However, since z ranges from 0 to x, while / ranges from 
— oo to -foo, we must remove the factor 2, with the result 
that 

dp(t) = v *a(v'/2, i) ■ & + "M"^ * • (see 8 , - 4) 

In other words the distribution of t, like that of F, is a 
special case of that of z. Consequently, r[(N — 2)/(l — /•*) ! 
is distributed like t with S — 2 degrees of freedom. 

10.4. Worked Example: 

In a sample of .V ■• 16 pairs of values drawn from a bivariate 
population, the observed correlation coefficient between the variates 
is 0*5, Is this value significant ? Find the minimum value of r 
for a sample of this size which is significant at the 5% level. 

Treatment : 

.V = 16. r = 0-5 K, = 1, k, = 14. 

Then F = 0-25 x 14/(1 - 0-25) = 4-667 

for 1 and 14 degrees of freedom; or 

f = 2-16 for 14 degrees of freedom. 

The 5% and 1% points of F for k, = 1 and v, = 14 degrees of 
freedom are 4-60 and 8-86; the value of t significant at the 6% level 
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is 214 (using Tables 9-6 and 91 respectively). We conclude, there- 
fore, that the observed value of r, 0-5, is just significant at the 5% 
level : there is less than I chance in 20 but more than 1 chance in 
100 that this value should arise by chance in random sampling of 
an uncorrelated population. 

The required minimum value of r is given by 

r« x 14/(1 - r») = 4-60 

or r = 0-497 

10.5. The Distribution of r. So far we have assumed that 
the population we have been sampling is uncorrelated. We 
must now consider the problem of testing the significance of an 
observed value of r when p -fr 0. 

The distribution of r for random samples of N pairs of values 
from a bivariate normal population in which p ^ 0 is by no 
means normal, and in the neighbourhood of p = ± 1 it is 
extremely skew even for large N. It was for this reason that 
Fisher introduced the important transformation 

* ■-» I l°g< [( 1 + r) 1(1 - r)] = tanh-> r . (10.6.1) 
The importance of this transformation lies in the fact that — 

z is approximately normally distributed with mean 
\ log, [1 + p)/(l — p)j =tanh-' p and variance 1/(^-3) 
and, as A' increases, this distribution tends to normality 
quite rapidly. 

(a) To decide whether a value of r calculated from a sample 
of N pairs of values from a bivariate normal distribution is 
consistent with a known value of the population correlation 
coefficient p, we put 

I = i log, [(1 + r)/(l - r)] = tanh-' r; 
Z = i log, [(1 + p)/(l - p)j = tanh-' p. 

Then (z — Z) l(N — 3)-i is approximately normally distributed 
with unit variance. Now the value of such a variate which is 
exceeded with a probability of 5% is 1-96 (see Table 5.4). 
Therefore for r to differ significantly from the given value of p 
at the 5% level, wc must have 

(z - Z)(N - 3)i > 1-96 

(6) Now assume that a sample of A : , pairs yields a value of 
r =r x and a second sample of A r 2 pairs a value r = r s . If the 
sampling is strictly random from the same population or from 
two equivalent populations, r, and r„ will not differ significantly. 

Should there be a significant difference, however, we should 
G 



194 



STATISTICS 



have reason to suspect either that the sampling had not been 
strictly random or that the two samples had been drawn from 
different populations. 

Let z, be the ^-transform of r, and z, that of r„. On the 
hypothesis that the two samples are random samples from the 
same population (or equivalent populations), we have (p. 141), 

var - z t ) = var «, + var z, = 1/(A?, - 3) + - 3) 

Hence the standard error of z t — z. is 

V + and h - «,)/ Vjvprs + n~s 

will be approximately normally distributed with unit variance. 
Consequently if 

there is no significant difference between r, and r s and we have 
no grounds for rejecting the hypothesis that the samples have 
been drawn at random from the same population ; if, however, 

we have grounds for suspecting that they have been drawn 
from different populations or that, if they have been drawn 
from one population, they are not random samples. 
10.6. Worked Example: 

A sample of 19 pairs drawn at random from a bivariate normal 
population shows a correlation coefficient of 0-65. (a) Is this con- 
sistent with an assumed population correlation, p = 0-40 ? (b) 
tt' hat are the 95% confidence limits for p in the light of the informa- 
tion provided by this sample ? (c) If a second sample of 23 pairs 
shows a correlation, r - 0-40. can this have been drawn from the 
same parent population > 

Treatment : 

(a) t = i log. (1-06/0-35) = 0-7753 

2 = i log, (1-40/0-60) = 0-4236 

U — Z) IN — 31* is normally distributed about zero mean with 
unit variance. I n the present case 

(* - Z)(N - 3)1 = 1-4068. which is less than 1-96. 

Consequently the value r = 0-65 from a sample of 19 pairs is 
compatible with an assumed population correlation of 0-40. 
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(6) To find the 95% confidence limits for p on the basis of the 
information provided by the present sample, we put 

I * - Z I x 4 < 1-06 or I x - Z I < 0-49. 

Consequently 

0-7753 - 0-49 < Z < 0-7753 + 0-49 
giving 0-2853 sg Z < 1-2653 or 0 2775 < p < 0-8824 

and these are the required 95% confidence limits for p. 

(e) The c-transforms of r, = 0-65 and of r, = 0-40 are respectively 

1, = 0-7753 and z, = 0-4236. 

On the assumption that the samples are from the same normal 
population (or from equivalent normal populations), the variance 
of their difference is equal to the sum of their variances. The 
standard error of r, — *, is then (1/16 + 1/20)1 = 0-3354. Thus 
(z, — iJ/0-3354 is distributed normally about zero mean with unit 
variance, and since (0-7753 — 0-4236) /0-3354 = 1 044 < 1-96. we 
conclude that there is no ground to reject the hypothesis that the 
two samples have been drawn from the same population (or from 
equivalent populations). 

10.7. Combining Estimates of p. Let samples of A/,. 
JV,, . . . Nt be drawn from a population and let the corre- 
sponding values of r be r,, r r t . How shall we combine 

these k estimates of p. the population correlation, to obtain a 
better estimate of that parameter ? 

Let the i-transforms of rj, [i = 1, 2, ... k) be ft, (» = 1, 

2, . . .It). Then these A values are values of variates which 
are approximately normally distributed with variances 
(Ni — 3) -1 , (» = 1, 2, ... A) about a common mean 
Z = tanh-'p. If wc " weight these k values with weights 
nu. (» = 1, 2, . . . k), the weighted mean is 

t * * * 

- mutl E = £ -Wi-i where M{ = mil S ro<. 

(-1 1- I i-1 ■ -! 

t 

If the variance of zy is a,-*, that of £ MiZi, o*. is given by 

l-l 

«» = s J»/,«o, ! = s mfaNi £ >»,-)». 

tml |W] 1-1 

Now a* is a function of the k quantities Mf. Let us choose 
these k quantities in such a way that o* is a minimum. The 
necessary condition that this should be so is that 



for all i. 



8a % ldmi = 0 
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* * 

i.e., for all i, = £ mfa 1 ! £ mi, a constant ; 

<- 1 (_i 

i.e. W( -/ 1 /a,-', for all i. 

The minimum-variance estimate of Z is then 

£ (N t - 3)al £ (W, - 3) 
(-1 (-1 

and the required combined estimate of p is 

p = tanh [^(AT,- - 3)1,/ S (N{ <- 3) J . (10.7.1) 

10.8. Worked Example : 

Samples of 20, 30. 40 and 50 are drawn from the same parent 
population, yielding values of r, the sample correlation coefficient, 
of 0-41. 0-80. 0-51, 0-48 respectively. Use these values of r to 
obtain a combined estimate of the population correlation coefficient. 

Treatment : We form the following table : 



r,. 


H 


N, - 3. 


(JV« - 3)*,. 


0-41 


0-436 


17 


7412 


0(10 


0-693 


27 


18-711 


0-51 


0-563 


37 


20-831 


0-48 


0-523 


47 


24-581 


Totals 




128 


71-535 



£ W - 2)z, 
i-i 



71-535 



= 0-5589 



giving 



128 

p = tanh 0 -558 9 = 0-507 



Z (<V. - 3) 



Note : (1) To save work tables of the inverse hyperbolic func- 
tions should be used to find the «-transforms of r. 

(2) The weighted mean of z obtained in the previous section and 
used here is approximately normally distributed with variance 
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1/ 2 (N, — 3). The accuracy of our estimate is then that to be 

'"' r * n 

expected from a sample of £ (.V, — 3) + 3 pairs. In the present 

example this variance is 1/128 = 0-0078, i.e.. the standard error of 
7. is 0 0883. Thus we may expect p to lie between 0-471 and 0-647. 
The value of Z we have obtained may be treated as an individual 
value calculated from a single sample of 131 pairs. 

10.9. Testing a Correlation Ratio. When the regression of 
y on x in a sample of N pairs of values from a bivariate popula- 
tion is curvilinear, the correlation ratio of y on x for the sample, 
e„ z , is denned by (7.4.4) 

where s/ is the variance of the means of the *-arrays and s„» is 
the sample variance of y. Moreover, e yI * — r* may, provision- 
ally, be taken as a measure of the degree to which the regression 
departs from linearity (but sec below, 10.10). 

We now have to devise a method of testing whether a given 
c t , is significant, i.e., whether such a value of could have 
arisen by chance in random sampling. 

We take Table 6.2.2 to be our correlation table and assunv 
our origin to be taken at the sample mean (S = 0 = y). Then 
the sum of the squared deviations of the y's from this mean is 

£ X/ty? = 2 Zfy{# - y, + y,)». 

where y,- is the mean of the y's in the .r,th array. Expanding 
the right-hand side, 

£ £^>* = £ £/«(» - 9t)* + £ £/«#• + 2 £ ^.fyytiyj - ?,) 
' i i i « / * i 

The cross-product vanishes because if £ /<, = Wf, the 
frequency of the y's in the *,th array, 

£ VfuHyi - 9i) = ZiytZfyyj) - £<** £/«) 
• i * i t i 

----- Wi* - n&e = o. 

Consequently 

£ ^Um* = £ f/'Ayj - yt)* + £ fftff. which, by 7.4, 

= Ns v *(l - «j, x ») + JVs/v' • • (10.9.1) 

If now there are p #-arrays, the p values of y ( are subject to the 
single restriction 2 £ fijyj = 2 n.yj = Ny( = 0, with our pre- 
sent origin). Thus the term Ns^e^' has p — 1 degrees of 
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freedom. Likewise, the Ny/s are subject to the same restric- 
tion and, so, for the term £ £ fyyp there are N — 1 degrees of 

( J 

freedom. Hence the term Ns ¥ *(l — e px *) involves (N — 1) 
— (p — 1) = N — p degrees of freedom. 

On the null hypothesis that there is no association between 
the variates in the population, each array may be regarded as a 
random sample from the population of y's. £ £ fyiyj — yi)* 

i j 

is the variation sum of squares within arrays, while the term 
£ Tufijy? is the variation sum of squares between arrays. On 

the null hypothesis, then, each of these sums divided by the 
appropriate number of degrees of freedom should give un- 
biased estimates of o„*. The corresponding analysis of 
covariance is, then : 



Source of 
variation. 


Sum of 
squares. 


Degrees of 
freedom . 


Estimate of o*. 


Between arrays 


«w 


P - 1 




Within arrays . 


Ns*(\ - O 


N -p 


"Vd -«,«•)/*-/> 


Totals . 


iVs y « 


N - 1 





If, then, the value of 

* = (T-' eVX^- l) 1 for Vl = * - ll w « " N ~ P (10 - 9 - 2 > 

is found to be significant, we must reject the hypothesis that 
there is no association of the kind indicated by the regression 
function in the population. In other words, the value of e >z 
obtained from the sample data is significant. 

10.10. Linear or Non-linear Regression ? I n 1 0. 2 we assu med 
that the regression of y on x was linear; in 10.9 we assumed it 
to be non-linear. We must now complete this set of tests with 
one which will indicate whether, on the sample data, regression 
is linear or non-linear, a test which is in fact logically prior to 
the other two. 

To do this we return to equation 

' ) * i i i 

Let b„ x be the coefficient of regression of y on x ; then, with our 
assumption that x = 0 = y, the regression line is y = b tl x. 
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Let Yi be the estimate of y obtained from this equation when 
x = y^ Then we may write 

= s SMn - W" + s s/<Kft - Yi) 1 + s 
' J i i t J 

the cross-product again vanishing (why ?) 

The first term on the right-hand side of this equation still 
represents the variation of y within arrays : the second term 
represents the variation of array-means from the regression 
Utie; and the third term represents the variation of the 
regression line about the sample mean. 

We have already seen that the term 

S ZMyi - W" - AVd - V) 
I 1 

has N — p degrees of freedom. Furthermore, 

£ Lfy($ - y,)« + £ £/ 0 y(' = AW 

with p — 1 degrees of freedom. Now we may write 

£ £ ItjYi* = V* S £/^,« = V £ mx?. 
il t i i 

But is independent of the regression and, so, the varia- 

tion it represents depends only on b tI and to it, therefore, 
corresponds but one degree of freedom. Moreover, 

V* £«<*<' = • Ns x * = A , r»i„». 

t 

Consequently the term £ ZfijYi' = Ns„*{ey r * — r»), with 
I I 

p — 2 degrees of freedom. 

On the hypothesis that regression is linear, the mean square 
deviation of array-means from the regression line should not 
be significantly greater than that of y within arrays. The 
analysis is shown in the table on page 200. 

We may thus test 

(1 - e VI )(P - 2) (io. 10. 1) 

If this value of F is significant, the hypothesis of linear 
regression must be rejected. It follows that it is not sufficient 
to regard e ¥I * — r * by itself as a measure of departure from 
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Source of 
variation. 


Sum of 
squares. 


Degrees 
of 

freedom. 


Mean 
square. 


Of array means about 
regression line 


wytw - r '> 


p-2 


p-2 


Of regression line 
about sample mean 


Nr's,* 


1 


AV's,« 


Within arrays 


Ns,>(\ - O 


N -p 


JVVU - ««*)/ 

N — p 


Totals 


Ns* 


N - 1 





linearity, for F depends also on e yr *, N and p. If the value of 
F is not significant, there is no reason to reject the hypothesis 
and analysis may proceed accordingly. 

10.11. Worked Example : 

Test for non-linearity of regression the data of 6-8 and 0-15. 
Treatment : We have «„« ■ 0-471 ; e„' - r' = 0-009 ; If m 100 



and p = 9. 



F = 



0-009 \ !)!M 
0-529 X 7 



= 2-409 for f, = 7, y, = 991. 



Using Table 9-6, we find that the 1% and 5% points of F for 
y, = 7. v, = 991 are 2 (56 and 2 02. The value of F is. therefore, 
significant at the 5% level, but not at the 1% level. There is some 
ground for believing that the regression is non-linear. 



EXERCISES ON CHAPTER TEN 

1. Test for significance the value of r found in Exercise 4 to 
Chapter Six. 

2. A sample of 140 pairs is drawn at random from a bivariate 
normal population. Grouped in 14 arrays, the data yielded r = 0-35 
and «„ = 0-45. Are these values consistent with the assumption 
that the regression of y on x is linear ? 

3. Test the values of <-„, and r found in Exercise 8 to Chapter Six. Is 
there reason to believe that the regression of y on x is non-linear ? 

4. Random samples of 10, 15 and 20 are drawn from a bivariate 
normal population, yielding r = 0-3, 0-4, 0-49 respectively. Form 
a combined estimate of p. 



Solutions 
2. Yes. 



4. 



P = 0-43 (2 d.p.). 



CHAPTER eleven 



CHI-SQUARE AND ITS USES 

11.1. Curve-fitting. What wc are actually trying to do 
when we fit " a continuous curve to an observed frequency 
distribution is to find a curve such that the given frequency 
distribution is that of a random sample from the (hypothetical) 
population defined by the curve we ultimately choose. Suppose 
that the observed frequencies of the values of the variate x 

are m, (i = 1, 2, . . . *). where £ tu = W. The value x, 

will in fact be the mid-value of a class-interval, x { ± i/if, say. 
Let 6{x) be the probability density of the continuous distribu- 
tion corresponding to the curve we fit to the data. Then the 
theoretical frequency of x, in a sample of N will be 

N / <p(x)dx = Npi. 

T, - 4/.J 

say. The question we now ask is: How well does this 
theoretical curve fit the observed data ? 

On the hypothesis that the fitted curve does in fact represent 
the (hypothetical) population from which the set of observed 
values of x is a random sample, the divergence of observed 
from theoretical frequencies must result from random sampling 
fluctuations only. I f . however, this total divergence is greater 
than that which, to some specified degree of probability, is 
likely to result from random sampling, we shall be forced to 
conclude that, at this level of significance, the fitted curve does 
not adequately represent the population of which the observed 
*"s have been regarded as a sample. Crudely, the " fit " is 
not good. 

11.2. The Chi-Square Distribution. Of a sample of N values 
of a variate x, let x take the value x x on »i, occasions out of the 
N, the value x t on tij occasions, and, finally, the value x t on 

tit occasions, so that 2 tH = N. If now the probability of 

x taking the value (t = 1, 2. . . . ft) be pu (» = 1, 2, . . . k), 
the probability ol x taking the value * t on w, occasions, the 

201 
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value x t on »», occasions and so on, regardless of order, will 
be (2.10) 

jVI * 

P = ~i • n p,- . . . (11.2.1) 

n m\ 

For sufficiently large values of we may use Stirling's 
approximation to n !, viz., 

n ! ==: (2rt)J»i(» + J) . exp (- n) 

then 

p _ (2r:)*.y-v-Mexp(- jV) 

— * . n pi'" 

n i [(2 Ir )iM.''< + iexp(-n < )] '-' 

JV* + 1 exp ( — N) * 
— —i T 5 — r L T -.nr/),-. /»«,"< + i] 



But 2 nj = JV, and, therefore 

P ~ F^T-j . n [/>./n,]»<-M 

(2*) 1 II p t i '"' 

or p - r=r-i . U INpUntfH+i . (11.2.2) 

(fciv) * n^i 

Now the expression (2™V)<*- »/« n p t i is independent of the 

n/s and. for a given N and a given set of theoretical prob- 
abilities. Pi, is constant. Therefore, putting 1/C for this 
expression, we have, for sufficiently large «,. 

log,P^log,C + i(n. + i) log, (Npilm). 

Write 

X, = {m- Np,)ftNpi)*. i.e.. = Jf£ -f. X,(NP()i. 
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Then, since 

2 n, = JV = 2 JVfr. 2 XiiNptf = 0 
4-1 (-1 (-1 

indicating that only h — 1 of the Xi's are independent. It 
follows that 

log, P = log, C + 2 [JV/>, + * ( (N/>()* + i] X 

X log, [ATWW, + A^V/.,)*)] 

= log. C - I JJVf < + *<(JV/>i)* + i] log, [1 + X^Npi)-*] 

If none of the pi 's are of the order Vh\ we have, on expanding 
the logarithm in a power series (see Abbott, Teach Yourself 

Calculus, p. 332) and using the fact that X Xi[Npi)l = 0. 

log, P rfkk)g,C - i S X* - i .2 Ar ( (A'/),)-i 

or, since jV is large and terms in AM and higher powers of 
N-* may be neglected, 

t 

log,P = log,C - i H Xt 

or P^Iexp(-i S^fl. • (H-2.3) 

c (- 1 

Now the n,'s are integers and since ni = Npt + XANpt)*, 
to a change of unity in n,, the corresponding change in Xi, 
KX b will be given by [±Xi = {Npi)-i. Remembering that 
only ft — 1 of the Xt's are independent, we have 

i exp (- i 2 XA 

(2 K iv)<*- , >' 2 a pt N ' 

exp 



i- 1 
1 



{Wp-W n (ivp,)i . />*i 
■-1 
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or, as .V tends to infinity, the probability of m^'s, m^'s, etc., 
P. is given approximately by the probability differential of a 
continuous distribution defined by 

P ^dp = B exp (- i kx i ^dX 1 dX 1 . . . rfAV, (11.2.4) 

Let us now consider the variate Xi in more detail. We begin 
by recalling that 2 n ( mmN; if then we allow N, the sample 

size, to vary, we have on the assumption that the frequencies 
are independent, 

*■ 

2 o ni « = o.v» .... (11.2.5) 

Iml 

I 

or varAT =^2 varnj . (11.2.5(a)) 

Again, if we treat the event of x taking the particular value 
X{ as a success in A' trials, the frequency of success will 
be distributed binomially with mean Npi and variance 
Npid - pi). N'ow put Zi = m — Npi. When N is treated 
as constant, var fa) = Npfl - p t ). If, however, we write 
"i — zi + Npi and allow N to vary, var (m) = var fa) 
+ Pi* var (.V), or var (»«,) = Npi{l - p ( ) + />,« var (A). Sum- 
ming over the t's, we have 

* * t 

var (AT) = 2 var (m) = N 2 p,[l - p,) + var (AT) . 2 
( »-> *-l <-i 

or, since 2^, = 1, var (A) = AT. 
Therefore, 

var lm) = Np,(l - £,) + tf/>,i = Npt (11.2.6) 

When, then. N is sufficiently large, the variates Xt s 
(m, — .\pi) HNpi)* are approximately normally distributed about 
zero mean with unit variance. It remains, therefore, to find 
the distribution of the sum of the squares of ft standardised 
normal variates, subject to the restriction that only A — 1 of 
them are independent. 

Let P = (X „ X„ . . . Xt) be a point in a ft-dimensional 
Euclidean space and put 

1 » = x « 
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subject to the restriction 2 AV(A ? />,)* = 0. An element of 

lm I 

volume in the A'-space now corresponds (sec 7.11) to the 

clement of volume between the two hyperspheres 2 A', 1 = x * 

* t 
and 2 Xi 1 = (•/ + dy) x subject to 2 Xi{Np s )l = 0. Using 

i-l 1-1 
arguments parallel to those used in 7.11, we find that the 
probability that of the N values of the variate x, m are 
(t = 1, 2, ... ft), is approximately equal to the probability 

that x, m ( S Xrj ues between x and x + <*X viz - 

dp = A exp (— ix*)X*-'<*X • • (H-2 7) 
where, since the probability of x taking some value between 
0 and co is unity, 

1=aJ exp (- ix')**-'^ 
o 

giving 

1M =»*-»>'* r[(A - l)/2] . . (11.2.8) 

This defines a continuoits distribution, which although 
actually that of x is sometimes erroneously called the x *- 
distribution. 

Since of the ftAYs only ft — 1 are independent, we may say 
that x has ft — 1 degrees of freedom. Putting, conventionally, 
v = ft — 1, we have 

d/> = _J x"- , e*p(-ixVx (H.2.9) 

2~ T(v/2) 

It can be shown, however, that if instead of one equation of 
constraint, there are p such linear equations and the number of 
degrees of freedom for x > s . consequently, ft — p, (11.2.9) still 
holds. When v = 1, 

dp = (2/7t)l exp (- ix*)<*X. 
which is the normal distribution with probability density 
doubled — due to the fact that / ^ 0, whereas in the normal 
distribution the variate takes negative and positive values. 

The /'-distribution proper is obtained by writing (11.2.9) in 
the form 

dp = --i exp (- ixW /s - , d(x , > < U - 2 - 10 > 

2'-r(w/2) 
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If now we write -/* — S, 
1 



dp = 



2"' S T(v/2) 



exp (- iS)S" 3 - l dS (12.2.10(a)) 



Then the probability that -/» will not exceed a given value 
Zo'is 



2"T(v/2)J„ 

The right-hand side is, in fact, an Incomplete T-function, and 
the above equation may be written in Karl Pearson's notation 

P(x ' <Xo ' 1 = , (vWTi' w2 ) ' tUAll) 

Tables of this function are given in Tables of the Incomplete 
V-function, edited by Pearson and published by the Biometrika 



03 



02 



01 
t 



V=2 



V-4 



V = 8 



^5 10 15 

..i exp(-x 2 /2)(x 2 ) v />-i 

j~ 2% r(v/ 2 ) 



Pig. H.2. 



CHI-SQUARE AND ITS USES 



20 7 



Office, University College, London. With them we can 
evaluate < Xj*) for v < 30 - 

For many practical purposes, however, we may use two 
formulas which given an approximate value of x J exceeded 
with a probability of 0 05. The approximate 0 05 point of 
X 2 for v < 10 is 

l-55(v + 2), 
while that for 35 ^ v > 10 is 

l-25(v + 5). 

Tables of y_* for various values of v < 30 are readily available. 
They include those in : 

(1) Statistical Tables for use in Biological, Agricultural and 
Medical Research, by Sir R. A. Fisher and F. Yates; 

(2) Statistical Methods for Research Workers, by Sir R. A. 
Fisher; and 

(3) Cambridge Elementary Statistical Tables, by D. V. Lindlcy 
and J. C. P. Miller. 

Our Table 11.2 is reproduced, by permission of the author 
and publisher, from (2). 

Table 11.2. Values of y* with Probability P of Being 
Exceeded in Random Sampling 



¥. \v 


0-1)0 


<)•»» 


0-06 


0-01 


\p. 
». \^ 


0-99 


0-96 


0-06 


001 


1 


Hi 


0-0'14 


3-81 


601 


16 


6-81 


7-101 


26-3H 


32-no 


2 


MM 


0103 


MO 


!>-21 


17 


6-41 


8-67 


27-59 


33-41 


S 


11-115 


0-36 


7-82 


11-34 


18 


7112 


9-30 


28-87 


34-80 


4 


0-30 


0-71 


9-40 


13-28 


IB 


7-63 


10-12 


3014 


36-19 


6 


0-f.S 


1-14 


11-07 


16-ns 


20 


8-26 


10-85 


31-41 


37-57 


6 


0-87 


1-04 


12-59 


16-81 


31 


8-90 


11-69 


32-67 


S8-93 


7 


I'M 


2-17 


11-07 


18-48 


22 


9-M 


12-31 


■Ml 


4"-29 


8 


1-05 


2-73 


1661 


20-09 


2S 


10-20 


13-110 


35-17 


4 1-61 


» 


203 


3-32 


16-92 


21-07 


14 


10-86 


13-85 


31112 


42-98 


10 


2-50 


3-94 


18-31 


23-21 


26 


11-52 


1161 


37-66 


44-31 


11 


3115 


4-68 


1U-08 


21-72 


26 


12-211 


16-38 


38-88 


4604 


n 


Ml 


4-23 


21-113 


20-22 


27 


I-J-- 


16-13 


40-11 


46-96 


13 


4-11 


Ml 


22-311 


27-09 


28 


13-66 


16-93 


41-31 


48-28 


14 


4-06 


6-67 


23 08 


lull 


29 


14-26 


17-71 


42-50 


49-69 


U 


5-23 


7-20 


26-00 


30-68 


80 


14-96 


18-49 


43-77 


50-89 



Notb : <1> The value of g obtained irom a sample may be significatuh small. Fie. 
11.3 will make this clear. When the value of /' obtained from the table is greater than 
0-»fi, the probability of a smaller value of j£ is less than &% and this value must, therefore, 
bo regarded as significantly small. The " fit " is loo good to be regarded without sus- 
picion ! (sec also Warning in 11.4). 

(2) When »• > 30. F isher has shown that Vwjf is approximately normally distributed 
about mean v'2r — 1 with unit variance. Thus Vs*" — V2v — 1 may be considered 
as a standardised normal variatc for values of v > 30. 
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11.3. More Properties of Chi-Squares. The moment-generat- 
ing function of the /'-distribution is 

M(t) m e(e*'<) = £"(«••") 



= »«r( v /2) jP«P (~ iS)SWS-i exp (S<)dS 

0 

- »,ar'(v/2) ( " S " 2-1 ex P [" W " B ^* S 



Putting |S(1 - 2/) = », rfS = t f 2< <*f 
(1 — 21)-*!'- 

m) " r(v/2) ~ / ex P (- "H*- 1 * 

r(v/2) WW 

i.e., Mtt) =(1 -2/)-'/s . . . (11.3.1) 
Expanding [1 — 2r] we have 

Af (») = 1 + v/ + v(v + 2) £j + . . . 

Hence «V = v; u.,' = v(v + 2) and, consequently, n, = u. 2 ' 
- H" = 2v - 

The mean of the x'-distribution for v degrees of freedom is, 
therefore, v and the variance 2v. 

The mean-moment-generating function in standardised units 
is, then. 



Hence 



log, M m (t) ^ - 1 log, [l - ^LJ 



= - + s l~-^= + s • tr + ■ • • higher powers of v'1 
V2« 2LV2* 2 2v ~ 6 l~ J 



751 2LV2v 
= ^ + higher powers of v - ' 
Thus M m (i) ->- exp (i/*) as v->oo . . (11.3.3) 
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and, comparing with (5.4.2), we see that the distribution tends 
to normality about mean v with unit variance. 
The mode of the /•-distribution is given by 

i.e., (v/2 - l)SW»-*eqr<- ±S) - &&f»-l«ap(- 1$) -0 

or S = Z * =v-2. . . . (11.3.4) 

Using Karl Pearson's measure of skewness, (mean-mode)/ 
standard deviation, the skewness of -/* is given by 



v - (vj- 2) 
V2v 



= N /r . . . (n.3.6) 



Now consider ^/'-variates, . . . /j? with v,. v„ . . . vp 

degrees of freedom respectively. If Si = yj* (i = 1, 2, 3, . . . p), 
the moment-generating-function of Si with respect to the 
origin is [1 — 2t\-':*. 

/> 

But the moment generating function of £ Si is. by definition 



= n f(expS(/) 
1-1 



since the S's are independent. 
Consequently, the m.g.f. of 



2 Si = II (1 - 2/)-i. s 
<«1 (-1 

-i £ "i 

= (1 — 2/) = (1 — 2l)-'i-, where v = ( i ' J v ' 

Hence we have the important theorem : 

If the independent positive variates Xi,(i = 1,2, . . . p), 
are each distributed like x 2 with v. degrees of freedom, 

(i = 1, 2, . . ./>), then 2 x, is distributed like x a with 

i-i 

v = Z vi degrees of freedom. 

i-i 

It follows at once that if the sum of two independent positive 
variates x l and #, is distributed like y 1 with v degrees of 
H 
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freedom, and if is distributed like jj* with v, degrees of 
freedom, *, is distributed like y* with v t = v — v, degrees of 
freedom. 

Some consequences of this additive property arc : 

(1) Suppose we conduct a set of n similar experiments to 
test a hypothesis. Let the values of y* corresponding 
to these experiments be jrt* (j = 1, 2* . . . »i) for v, 
(t = 1, 2, . . . n) degrees of freedom. If then we write 
X* = - «*• t'»s value of y* will in fact be the value 
obtained from pooling the data of the n experiments and 
will correspond to y = Sv/ degrees of freedom. For 
example- 
Three experiments designed to test a certain 

hypothesis yielded 

Zl « = 9 00 for v, = 5; Zl » = 13-2 for v J = 10; 

X,» = 191 for v 3 = 15. 
None of these on its own is significant at the 10% 
point, as the reader may verify. Their sum ■/* 
m 91-3 for v = 30 d.f. is, however, significant at the 
10% point. Thus we see that the data of the three 
experiments, when pooled, give us less reason for con- 
fidence in our hypothesis than do those of any one of the 
experiments taken singly. 

(2) Next assume that a number of tests of significance 
(three, say) have yielded probabilities p x , p z , p 3 . We 
know nothing more about the tests than this, yet — in 
view of the experience of (1) — we require to obtain some 
over-all probability corresponding to the pooled data. 

Now any probability may be translated into a value 
of •/* for an arbitrarily chosen v. But when v = 2, 
log e /> = — \y?. This is very convenient, for if v.* 
Xt* Xt" are the values of -/» for v = 2 corresponding" to 
Pin P* Pi respectively, the pooled value of y_» is — 2 (log t />, 
+ l°g* Ps + lobV/'s) forv = 2 + 2 + 2= C d.f. and the 
required pooled probability is obtained. For example : 

p x - 0-180 log,£, = - 1-897120 
p t = 0-250 log,p„ =- 1-386294 
p 3 = 0-350 log,/> 3 = - 1 0498 12 

- 5-333226 
X - 2 
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The 5% point of y 1 for v = (1 is 12-592 
„ 10% „ „ v = 0 ., 10 045 

We see then that the pooled probability is slightly less 
than 0-10. To find this more accurately, we notice that 
the pooled value of -/* exceeds the 10% point by 0 021, 
while the 5% point exceeds the 10% by 1-947. 

Now ' log 10 0-10 - - 1 and log 10 0 05 = 2-69897 
= — 1-30103. The difference between these is 0-30103. 
Therefore the pooled value of y* corresponds to 

- 1 - t-j!t? * 0-80108 = - 1-0325 = 2-9675. 

1-94/ 

The antilog of 2-9675 is 00944. Interpolating thus, 
then, we find the required pooled probability to be 0-091, 
to three decimal places. 

11.4. Some Examples of the Application of x* : 

(.-1) Theoretical Probabilities Given 

Twelve dice were thrown 26,306 times ami a B or a 6 was counted 
as a success. The number of successes in each throw was noted, 
with the following results (Weldon's data) : 



Number of 
| successes. 


Frequency 


Number of 
successes. 


Frequency. 


0 
1 

2 
3 
4 

6 

1 


[80 
1 . 1 4!) 
3.211.-. 
-VI T.'i 
6,114 
5, 1 M 


6 

7 
8 
9 
10 


3.067 
1.331 
403 
in;, 
18 


Total 


26.306 



Is there evidence that the dice are biased ? 

Treatment : We set up the hypothesis that the dice are unbiased. 
This means that the probability of throwing a 5 or a 6, a success, 
is J = -J and the probability of a failure, j. The theoretical 
frequency generating function on this hypothesis is then 26.300 

(i + it)». 



1 Either natural logarithms (base c) or common logarithms (base 
JO) may be used, since log, at — log, 10 x log 10 *. 
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The estimated frequencies arc then found to be (correct to the 
nearest integer) : 



Number of 
successes 


0 


1 


2 


3 


4 


6 


6 


Observed fre- 

rlHOIi/'l* Iai 

■ | ' - * I.' • \OJ . 


185 


1.149 


3.265 


5.425 


B.U4 


5,104 


3.007 


Theoretical 

■ i U ■' ..' \tl 


203 


1.217 


3.345 


5,576 


6,273 


5.018 


2.927 


Number of 
successes 


7 


8 


ft 


10 


Totals 


Observed fre- 
quency (o) . 


1.331 


403 


105 


.8 


26.306 


Theoretical 
frequency («) 


1.254 


302 


87 


14 


26.306 



Then x* — ^ ~ — = 36-9. 



There are 1 1 classes and one restriction, namely, the size of the 
sample total, has been imposed. The number of degrees of freedom 
is thus 10. 

The 1% level of x> for r = 10 is 23-31. The value of obtained 
is then highly significant and the hypothesis that the dice are 
unbiased is, therefore, rejected. 

(H) Theoretical Probabilities not Given 

The following table gives the distribution of the length, measured 
in cm., of 204 eggs of the Common Tern collected in one small 
coastal area : 



Length 




Length 




{central 


Frequency. 


(central 


Frequency. J 


values). 




values). 


3-5 


1 


4-2 


54 


3-6 


1 


4-3 


34 


3-7 


6 


4-4 


12 


8-8 


20 


4-5 


6 


30 


35 


4-6 


1 


40 


53 


4-7 


2 


41 


60 
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Test whether these results are consistent with the hypothesis that 
egg-length is normally distributed. (L.U.) 

Treatment : We have first to fit a normal curve. This entails 
calculating the sample mean and sample variance. The sample 
mean is an unbiased estimate of the population mean. The sample 
variance may be taken to be an unbiased estimate of the population 
variance, since -V = 294 is large. 

We find by the usual methods that 

i = 4 004 ; s = 0-184. 

Estimated frequencies are obtained by the method of 5-0 and we 
find 



Length 


3-5 3-6 3-7 3-8 3-9 40 4 1 


woserveu lre- 
quency (o) 


8 

1 1 6 20 35 53 60 


Estimated fre- 
quency (<) 


0-4 20 6-7 17-9 37-0 55-3 02-6 

' v ' 

01 


Length 


4-2 4-3 4-4 4-5 4 6 4-7 


Total 


Observed fre- 
quency (o) 


0 

54 34 12 6 1 2 


294 


Estimated fre- 
quency (<) 


54 1 33-8 16-3 0 1 14 0-4 
70 


294 



y» =^ v (2 _ S— — 35 o + £«: but .V = So = £ t. 

Note : (1) This form is preferable where the estimated frequencies 
are not integers and. consequently, o — t is not integral, since it 
removes the labour of squaring non-integers. 

(2) Because we derived the distribution of j£ on the assumption that 
Stirling's approximation for n I held, i.e., that the class frequencies 
were sufficiently large, we group together into one class the first 3 classes 
with theoretical frequencies < 10 and into another class the tail 3 classes 
with frequencies < 10. This effectively reduces the number of classes 
to 9. There is some divergence of opinion as to what constituted a 
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" low frequency " in tliis connection. Fislier (Statistical Methods 
/or Research Workers) has used the criterion < S ." Aitken (Statistical 
Mathematics) prefers < 10: while Kendall (Advanced Theory of 
Statistics) favours < 20. The reader would do well to compromise 
with a somewhat elastic figure around 10. 

. _ 20* 3<P 53' 69' , 54' 

* 01 T 17-9 T 37 0 + 65 : 3 + 62-6 ~ 6T1 

. 34' 12' , 9' „ ni 
+ 3T8 + W3 + TO ~ 204 

= 296-5 - 294 = 2-5 

We must now calculate the corresponding degrees of freedom. 
There are effectively 9 classes. One restriction results from the 
fact that the total observed and total estimated frequencies are 
made to agree. Also, from the sample data, we have estimated 
both the mean and variance of the theoretical parent population. 
We have then 3 constraints and, consequently there are 9 — 3 = 
6 degrees of freedom. Entering the table at v = 0, we find that 
the chance of such a value of v' being obtained at y = 6 lies b e tw ee n 
P — 0-95 and P = 0-50 at approximately P = 0-82. We conclude, 
therefore, that the fit is good but not unnaturally good, and, conse- 
quently, there is good reason to believe that egg-length is normally 
distributed. 

Warning : It may happen that the fit is unnaturally good. 
Suppose the value of x* obtained was such that its probability of 
occurrence was 0-999. Fisher has pointed out that in this case if 
the hypothesis were true, such a value would occur only once in a 
thousand trials. He adds : 

"Generally such cases are demonstrably due to the use ol 
inaccurate formula?, but occasionally small values of *' beyond 
the expected range do occur. . . . I n these cases the hypothesis 
considered is as definitely disproved as if /' had been 0-001" 
(Statistical Methods for Research Workers, 11th Edition, p. 81). 

11.5. Independence, Homogeneity, Contingency Tables. 

When we use the -/'-distribution to test goodness of fit we are 
testing for agreement between expectation and observation. 
Tests of independence and homogeneity also come under this 
general heading. 

Suppose we have a sample of individuals which we can 
classify in two, or more, different ways. Very often we want 
to know whether these classifications are independent. For 
instance, we may wish to determine whether deficiency in a 
certain vitamin is a factor contributory to the development of 
a certain disease. We take a sample of individuals and classify 
them in two ways : into those deficient in the vitamin and 
those not. If there is no link-up between the disease and the 
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vitamin deficiency (i.e., if the classifications are independent), 
then we calculate the expected number of individuals in each 
of the four sub-groups resulting from the classification : those 
deficient in the vitamin and diseased ; those deficient in the 
vitamin and not diseased; those diseased but not deficient in 
the vitamin, and those neither diseased nor deficient in the 
vitamin. We thus obtain four observed frequencies and four 
expected frequencies. If the divergence between observation 
and expectation is greater than is probable (to some specified 
degree of probability) as a result of random sampling fluctua- 
tions alone, we shall have to reject the hypothesis and conclude 
that there is a link-up between vitamin-deficiency and the 
disease. 

It is usual to set out the sample data in a contingency table (a 
table in which the frequencies are grouped according to some 
non-metrical criterion or criteria). In the present case, where 
we have two factors of classification, resulting in the division 
of the sample into two different ways, we have a 2 x 2 con- 
tingency table. Now suppose classification 1 divides the sample 
of A" individuals into two classes, A and not-A, and classifica- 
tion 2 divides the sample into two classes, B and not-B. Let 
the observed frequency in the sub-class "A and B" be a; 
that in " not-A and B be b; that in " not-B and A " be c; 
and that in " not-A and not-B " be d; we may display this in 
the 2x2 contingency table 





A. 


Not-A. Totals. 


B 


a 


b a + b 


Not-B 


c 


d c + d 


Totals 


a + c 


b - </ a - b -r c -f d = .V 



On the assumption that the classifications are independent, we 
have, on the evidence of the sample data, and working from 
margin totals, 

the probability of being an A = (a -f c)/N; 

a not-A = {b + d)/N; 

., a B = (a + b)IN; 

, a not-B = (c + 

Then the probability of being " A and B " will be ^ a jjt C ^ 
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(a + b) 



X — — and the expected frequency for this sub-class in a 
sample of N will be l±±jHjL±J$ . Likewise, the probability 

of being " B and not-A " is {a + b) ^ b + d \ and the corre- 
sponding expected frequency - a + b ^ b + ^ . In this way we 
can set up a table of corresponding expected frequencies : 





A. 


Not-A. 


B 


(a + c)(a + 6) 
N 


(b + d)(a + 6) 
N 


Not-B 


(a + c)tc + d) 
N 


(b + d)(c + d) 
N 



Providing the frequencies are not too small, we may then 
use the /'-distribution to test for agreement between observa- 
tion and expectation, as we did in the good ness-of -fit tests, and 
this will, in fact, be testing our hypothesis of independence. 
•/}, in this case, will be given by 

T _ (a + c)(a + b) T« I" _ (h + d)(a + M 
, L [«4>tC + d)J L (a + b + c + d')J 



(a + c)(a - b) 



(b + d)(a + b) 



(« + b + c -f d) (a + b + c + d) 

\ c _ (a + c)(c + d) -y r _ (b + d)(c + d) T 
f L (a + b -f- c -f d)J + L (g + b + c ± d)J 



[a c)(, ,/) 
(a -r- b + c + d) 

(ad - be)* T 1 
N L 



_(b -[- rf)(c + 
(a + 6 + c + </} 



(a + c)(a + b) + (b + d){a + b) 
1 



1 



+ 



(a + c)(c + d) + (b + d)(, 
A'* 



or 



(ad - be) 1 

AT ' (a + fr)(a + e)(« + d)(b + rf) 

^ (gg- bc)*(a + b + c + d) 
* (a + W(a + c)(c + d)(6 + d) 



(11.5.1) 
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It remains to determine the appropriate number of degrees of 
freedom for this value of •/*. We recall that the expected 
values are calculated from the marginal totals of the sample. 
Directly then we calculate one value, say that for " A and B ", 
the others are fixed and may be written in by subtraction from 
the marginal totals. Thus the observed values can differ 
from the expected values by only 1 degree of freedom. Con- 
sequently 

for a 2 x 2 contingency table, the number of degrees of 
freedom for is one. 

Worked Example : A certain type of surgical operation can be per- 
formed either tuith a local anasthetic or with a general anasthetic. 
Results are given below : 





Alive. 


Dead. 


Local 
General 


511 
173 


24 

n 



Test for any difference in the mortality rates associated with the 
different types of anesthetic. ( R.S.S.) 

Treatment : Our hypothesis is that there is no difference in the 
mortality rates associated with the two types of an.-esthetic. The 
contingency table is : 





Alive. 


Dead. 


Totals. 


Local 


.ill 


24 


535 


General . 


173 


21 


104 


Totals . 


084 


45 


729 



Using the marginal totals, the expected values are (correct to the 
nearest integer) : 





Alive. 


Dead. 


Totals. 


Local 




33 


535 


General 


182 


12 


1!I4 


Total 


084 


45 


72U 
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Accordingly. x « :== 9-85 for v = 1 d.f. The probabilitj' of this 
value of x 1 's 0 001 7 approximately. The value of x> obtained is. 
therefore, highly significant, and we conclude that there is a differ- 
ence between the mortality rates associated with the two types of 
anaesthetics. 

Suppose we were to regard the frequencies in the general 
2x2 contingency table to represent two samples distributed 
according to some factor of classification (being A or not-A) 
into 2 classes, thus : 



A. 


Not-A. 


Totals. 


Sample I 


a 


6 


a -i- 6 


Sample 11 


* 


d 


c + d 


Totals . 


a + c 


b + d 


a + b + c + d{=N) 



We now ask : "On the evidence provided by the data, can 
these samples be regarded as drawn from the same popula- 
tion ? " Assuming that they are from the same population, 
the probability that an individual falls into the class A, say. 
will be the same for both samples. Again basing our estimates 
on the marginal totals, this probability will be (a + c) /N and 
the estimated or expected frequency on this hypothesis of the 
A's in the first sample will be (a + c) x (a + b)/N. In this 
way we calculate the expected frequencies in both classes for 
the two samples, and, if the divergence between expectation 
and observation is greater to some specified degree of prob- 
ability than the hypothesis of homogeneity demands, it will 
be revealed by a test which is mathematically identical with 
that for independence. 

11.6. Homegeneity Test for 2 x * Table. The individuals 
in two very large populations can be classed into one or other of 
k categories. A random sample (small compared to the size of 
the population) is drawn from each population, and the follow- 
ing frequencies are observed in the categories : 



Category. 


1 2 ... 1 ... M 


Total. 


Sample 1 
Sample 2 

1 


"ll "l» • • ■ "l< • - • " u 
«„ ti„ ...»;„... M u 


A, 
A', 



Devise a suitable form of the ■/'■criterion for testing the hypo- 
thesis that the probability that an individual falls into the Uh 
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category (t = 1, 2, . . . k) is the same in the two populations. 
Derive the appropriate number of degrees of freedom for 
(L.U.) 

This is a homogeneity problem, for the question could well 
be reformulated to ask whether the two samples could have 
come from the same population. 

On the assumption that the probability that an individual 
falls into the /th class is the same for the two populations, we use 
the marginal totals, n u + n„. (/ = 1, 2, . . . A), to estimate 
these probabilities. Thus our estimate of the probability of 
an individual falling into the /th class, on this assumption, is 
(n u + n !l )l(N l -f- A',). The expected frequency, on this 
assumption, in the /th class of the 1st Sample is therefore 
-V,(«y + « a )/(A\ + A 1 ',) and that for the same class of the 
2nd Sample is N t [n v + HjJ/fiV, + A r ,). Consequently, 

. £ J (A>„ - A>,,]' , [A>„ - A>„j« 1 

,Z i WiiN, + NJ(n u + n„) N t (N l + A' 1 )(n u + «„)/ 

or 

. _ .£ (A> u - A>„)' _ v * (n u l\\ - nyiKJ* 

'■ iii a i-v «(»<>, + «*> * 11 ! ,r, [jitf+4) 

(11.6.1) 

How many degrees of freedom must be associated with this 
value of /* ? To construct the table of expected frequencies, 
wc must know the grand total, one of the sample totals and 
ft — 1 of the class totals. There are, therefore, I -t- 1 + ft — 1 
= A + 1 equations of constraint, and, since there are 2ft 
theoretical frequencies to be calculated, v = 2A — (ft — 1) 
= ft — 1 degrees of freedom. 

11.7. h x k Table. In the case of a // x ft table, we follow 
exactly the same principles, whether we are testing for indepen- 
dence or homogeneity. To calculate the appropriate number 
of degrees of freedom, we note that there are ft x A theoretical 
frequencies to be calculated. Given the grand total, we require 
A — 1 and A — 1 of the marginal totals to be given also. There 
are thus h -f A — 1 equations of constraint, and consequently 
v = A/» — (ft + A — 1) = (A — 1)(A - 1) degrees of freedom. 
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Worked Example : A Ministry of Labour Memorandum on Carbon 
Monoxide Poisoning (1045) gives the following data on accidents 
due to gassing by carbon monoxide : 





1941. 


11142. 


1043. 


Totals. 


At blast furnaces 


24 


20 


10 


03 


At gas producers 


28 


34 


41 


109 


, At gas ovens and works . 


26 


26 


10 


62 


hi distribution and use of 










gas . 


80 


108 


123 


311 


Miscellaneous sources 


68 


51 


32 


151 




226 


230 


225 


600 



Is there significant association between the site of the accident 
and the year J 



Treatment : On the assumption that there is no association be- 
tween the origin of an accident and the year, the probability of an 
accident in any given class will be constant for that class. 

The probability of an accident at a blast furnace is estimated from 
the data to be (24 + 20 + 10)/(600) = 63/600. Hence the expected 
frequency of accidents for this source in a yearly total of 220 will 
be 03 x '220/600 - 20-64. 

l'rocecding in this way, we set up the following table : 





1041. 


1042. 


1043. 




o. 


e. 


o. 


*. 


0. 


t. 


Mast furnaces . 


24 


20-64 


20 


21-82 


I!) 


20-54 


Gas producers . 


28 


33-74 


34 


35 08 


41 


33-.V.I 


, Gas-works and 














coke ovens 


26 


20-31 


26 


21-48 


10 


20-22 


Gas use and dis- 














tribution 


80 


101-80 


108 


107-72 


123 


101-41 


Miscellaneous 


68 


40-40 


51 


52-30 


32 


40-27 



We find that x > =^ 34-22 for v = (5 — 1)(3 — I) = 8 d.f. and the 
table shows that this is a highly significant value. We. therefore, 
reject our hypothesis of no association lM;twcen source of accident 
and year, i.e., the probability of an accident at a given source is not 
constant through the years considered. 



11.8. Correction for Continuity. The /'-distribution is 
derived from the multinomial distribution on the assumption 
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that the expected frequencies in the cells are sufficiently large 
to justify the use of Stirling's approximation to n I When, in 
some cells or classes, these frequencies have fallen below 10, 
we have adjusted matters by pooling the classes with such low- 
frequencies. If c such cells are pooled, the number of degrees 
of freedom for •/' is reduced by c — 1. If. however, we have a 
2x2 table with low expected frequencies, no pooling is 
possible since v = 1 for a 2 X 2 table. We have, therefore, 
to tackle the problem from some other angle. In fact, we may 
either modify the table and then apply the /'-test or we may 
abandon approximate methods and calculate from first prin- 
ciples the exact probability of any given set of frequencies in 
the cells for the given marginal totals. In the present section 
we shall consider the method by which we correct " the 
observed frequencies to compensate somewhat for the fact that, 
whereas the distribution of observed frequencies is necessarily 
discrete, that of the -/'-distribution is essentially continuous. 
In the course of treatment of the example in the next section 
we shall develop and illustrate the " exact " method. 

Suppose we toss an unbiased coin ten times. The expected 
number of heads is \ x 10 = 5 and the probability of obtaining 
just r heads is the coefficient of /' in the expansion of {\i + \y. 
We have : 

The probability of 10 heads, tflOH) - (J) 10 = 0 00099. 
This is also the probability of 0 heads (or 10 tails). Therefore 
the probability of either 10H or 011 is 2 X 0 00099 = 0 00198. 

Using the /'-distribution, the value of •/' for 10 heads or 
0 heads is 

(10-5)» (0-S)' _ 10 
* 5 5 

and this value is attained or exceeded for v= 1 with a prob- 
ability of 0 00157. Half this, 0 000785. gives the -/'-estimate 
of the probability of just 10 heads. 

The probability of 9 heads is 10 x 0 00099 = 0-0099 and 
hence the probability of 9 or more heads in 10 tosses is 
0 00099 — 0 0099 = 6-010S9, while the probability of 9 or 
more heads and 1 or less tails is 2 x 0-01089 0 02 178. 

iff 5\i (| ni 

The corresponding value of •/* — F 1 - = 6-4 

and, for v = 1, the probability of this value being attained 
or exceeded is 01 141. Half this value, 0 05705. gives us the 
■/'-estimate of the probability of obtaining 9 or more heads. 

We can see that the -/'-estimates are already beginning to 
diverge quite considerably from the "exact" values. The 
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problem is : can we improve matters by finding out why this 
should be ? 

We recall that when v = 1, the /'-distribution reduces to the 
positive half of a normal distribution. The area of the tail of 
this distribution to the right of the ordinate corresponding to a 
given deviation, r, of observed from expected frequency, gives 
therefore a normal-distribution approximation to the prob- 
ability of a deviation attaining or exceeding this given deviation, 
irrespective of sign. However, in the case we are considering, 
the symmetrical binomial histogram is composed of frequency 
cells based on unit class intervals, the central values of the 
intervals being the various values of r; the sum of the areas 



I 




Fig. 11.8. 

of the cells corresponding to the values r. r + I, r + 2, etc., 
gives the exact probability of a deviation ^ + r. When, 
however, the frequencies in the tail are small, we are taking, for 
the continuous curve, the area to the right of r, but for the 
histogram the area to the right of r — \ (see Fig. 11.8). 
Clearly a closer approximation would be obtained if we calculated 
-/* for values not of r, the deviation of observed front expected 
frequency, but for values of \ r — \ \, i.e., if we " correct " the 
observed frequencies by making them J nearer expectation, we shall 
obtain a " better " value of ■/_*. 

This is Yates' correction for continuity for small expected 
frequencies. Its justification is based on the assumption that 
the theoretical frequency distribution is a symetrical binomial 
distribution (p = q = \). If tins is not so, the theoretical 
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distribution is skew, and no simple adjustment has been dis- 
covered as yet to offset this. However, if p is near \, the 
correction should still be made when the expected frequencies 
are small, for the resulting value of ■/_' yields a probability 
definitely closer to the " exact " value than that we obtain 
when the correction is not made. This is brought out in the 
following example. 

11.9. Worked Example. 

In experiments on the immunisation of cattle from tuberculosis, 
the following results were obtained : 





DiedofT.B. 
or very seri- 
ously affected. 


Unaffected 
or slightly 
affected. 


Totals. 


Inoculated with vaccine . 


6 


13 


19 


Not inoculated or inocul- 
ated with control media 


8 


3 


11 


Totals 


14 


10 


30 



Show that for this table, on the hypothesis that inoculation and 
susceptibility to tuberculosis are independent, x* = P = 

0-020 ; with a correction for continuity, the corresponding prob- 
ability is 0 072; and that by the exact method, P 0-071. 

{Data from Keport on the Sphalinger Experiments in Northern 
Ireland, 1981-1084. H.M.S.O., 1934. quoted in Kendall, 
Advanced Theory of Statistics, I). 
Treatment : (1) On the hypothesis of independence — i.e., that the 
probability of death is independent of inoculation — the probability 
of death is JJ. Therefore the expected frequencies are : 





10-13 


513 


5-87 



Kach observed frequency deviates from the corresponding 
expected frequency by — 2-87. Hence 

r - (2-87). + j*, + ^ + j^j] 

= 8-237 x 0-577 m 4-75 

and for v = 1 the probability of x* attaining or exceeding this value 
is 0-029. This figure, it must be emphasised, is the probability of 
a proportion of deaths to unaffected cases of 0 : 13 or lower in a 
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sample of 19 inoculated animals and of a proportion of 8 : 3 or higher 
in a sample of 1 1 animals not inoculated, on the hypothesis of inde- 
pendence, i.e., on the assumption that the expected proportion for 
either sample is 14 : 16. 

(2) The observed frequency with the continuity correction applied 
are : 

6-5 I 12-5 



7-5 3-5 



and, consequently x* — (2-37)* x 0-577 = 3-24 . yielding, for v = 1, 
P = 0-072. 

(3) We must now discuss the method of finding the exact prob- 
ability of any particular array of cell frequencies for a 2 x 2 table. 
Consider the table 



a 


b 


(a + *) 


c 


d 


(c + d) 


(a + c) 


(b + d) 


(a + b + c + d)=N, say. 



First, we consider the number of ways in which such a table can 
be set up with the margin totals given from a sample of ;V. I-'rom 

A' items we can select a + c items in ^ e J ways, when b + d 

items remain, while from A' items we may select a + b items in 

(« + bj wavs - w ' tn c + d 'terns remaining. Therefore, the total 

number of ways of setting up such a table with the marginal totals 
as above is 



/ N \l N \ ( AM)' 

\a + c]\a I- b] (a + c)\ (b + d)\ (a + b)\ 



(T+3)! = " ,,Say - 



Secondly, we ask in how many ways we can complete the 4 cells 
in the body of the table with ;V items. Clearly this is the number 
of ways in which we can divide the N items into a groups of a items 
of one kind, b items of a second kind, c items of a third kind and 
rf items of a fourth kind, where A* = a + b + c + d. But (2.10) 
we know this to be 

AM 

= say. 



a\ b< c\ dl 



Consequently the probability of any particular arrangement, 
P(a. b, c, d). will be given by 

PUh.c* (" + b)'.(c + d)\(a + c)l{b~d)\ 
P(a.b,c.d)-- N\ a \b\c\d\ • (U0 " 

How shall we use this result to solve our present problem ? 
We are interested here, we emphasise, in the probability of obtain- 
ing a proportion of deaths to unaffected cases of 6 : 13 or lower in a 
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sample of 19 inoculated animals and of obtaining a proportion of 
deaths to unaffected cases of 8 : 3 or higher in a sample of 1 1 animals 
not inoculated. In other words, we are interested in the probability 
of each of the following arrays : 

16 13 I I 6 14 I I 4 16 I 3 16 

I 8 3 r I 9 2 I' I 10 1 M 11 0 I 

But it will be seen immediately that the probability of obtaining a 
6:13 ratio among inoculated animals is also precisely that of obtain- 
ing a ratio of 8:3 among animals not inoculated. Hence the 
required probability will be twice that of the sum of the probabilities of 
these 4 arrays. 

The probability of I ,^ '{J I is, (by 11.9.1). 

19! 16! 14! II! 19! 14! 



30! 16! 11! 3! 01 3013!' 

We may evaluate this by means of a table of log factorials (e.g., 
Chambers' Shorter Hix-l-'igure Mathematical Tables. We have 

log 19! = 17 083095 log 30! - 32-423600 

log 14! - 10-940408 log 3!= 0-778151 



28025503 38-201811 
28 023503 ■ 33-201811 = 6-823692 and the antilog of this is 
P(3. 16. 11. 0) = 0 00000666 

/>(4 I", 10 I). •»!»«! , 6 II 0) 

= 44 X 0-00000600 ■ 0-00029304 

Similarly, 

P(5, 14. 9, 2) = l ^.' 0 / , (4. 15. 10. 1) 



= 15 X 0-00029304 0-00439560 



Finally, 



/>(8. 13. 8. 3) = = P[5. 14, 9. 2) 

= 7 X 0-00439560 - 0-O3O70920 
The required probability then is 2 x 0 03546450 = 0-07002900 . 

11.10. x 2 -determination of the Confidence Limits of the 
Variance of a Normal Population. We conclude with an 
example of the way the /'-distribution may be used to give 
exact results when the observed data are not frequencies. 

Let us draw a small sample of N( < 30) from a normal 

0 

population. If NS* = E (xi — x) 1 and o* is the population 
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.V 

variance, .VS'/o* = S to — *)'/<*" and is thus distributed 
like x* with iV - 1 degrees of freedom (the one constraint 
being £ #) = A*). Our problem is this : 

Given N and $*, to find the 95% confidence limits for a*. 

Since A'S*/o J is distributed like ■/*, the value of A r S*/o* that 
will be exceeded with a probability of 0 05 will be the 0 05 
point of the /'-distribution for v = A' — 1. Let y^ 2 be this 
value. Then the lower 95% confidence limit required, on the 
basis of the sample information, will be AS'/Xo-os'- Likewise 
the upper 95% confidence limit will be N&lfy^?, where y 0 .„ s a 
is the 0-95 point of the /^-distribution for v A' — 1. 

Worked Example : A sample of 8 from a normal population yields 
an unbiased estimate of the population variance of 4-4. Find the 
95% confidence limits for a. 

Treatment : We have 4-4 = 8S*/(8 - 1) or 85* — 80-& The 
0-96 and 0-05 points of the ^'-distribution for ► = 7 are 2-17 and 
14 07 respectively. Therefore the lower and upper 95% confidence 
limits for o* are, respectively, 

30-8/14 07 = 2J9 and 30-8/2-17 = 13-73 . 
The corresponding limits for o are (2-19)1 _ L48and (13-73)* = 3-69. 

EXERCISES ON CHAPTER ELEVEN 



1 . The Registrars-General give the following estimates of children 
under five at mid-1947 : 





England 
and Wales. 


Scotland. 


Total. 


Males 
Females . 


1,813,000 
1.723,000 


228.000 
221,000 


2,041.000 
1.944,000 


Total . 


3,536,000 


449,000 


3.985.000 



On the assumption that there is no difference between the pro- 
portion of males to females in the two regions, calculate the prob- 
ability that a child under five will be a girl. Hence find the expected 
number of girls under five in Scotland and say whether the pro- 
portion is significantly high. (L.U.) 

2. The following data give -V, the number of days on which rainfall 
exceeded R in. at a certain station over a period of a year : 

H . 0-00 0 04 0 10 0-20 0-50 1 00 
N 21)6 246 187 119 30 3 
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Test by means of x' whether the data arc consistent with the law 
k>gio .V = 2-47 - l-98/c\ Is the " fit " too good ? (R.S.S.) 

3. The following information was obtained in a sample of 50 small 
general shops : 





Shops in 






Urban 


Rural 


Total 




Districts. 


Districts. 




Owned by man 


17 


18 


35 


,, women 


3 


12 


15 


Total . 


20 


30 


50 



Can it be said that there are relatively more women owners of small 
general shops in rural than in urban districts ? (L.U.) 

4. A certain hypothesis is tested by three similar experiments. 
These gave x* ■ H-9 for v = 6, x 1 = U*8 for » — 8 and x 5 = 18-3 
for rvlL Show that the three experiments together provide 
more justification for rejecting the hypothesis than any one experi- 
ment alone. 

5. Apply the x* test of goodness of fit to the two theoretical 
distributions obtained in 4.7., p. 72. 

Solutions 

L 219,000. Yes. 2. Far too good. x * < 0 02 for v m 5. 

3. No. 



APPENDIX : 

CONTINUOUS BIVARIATE DISTRIBUTIONS 

Suppose that we have a sample of A' value-pairs {xi, yj) 
from some continuous bivariate parent population. Across 
the scatter diagram draw the lines 

x = x + \Ax, x = x — $Ax, y — y + }Ay 
and y = y — -JAy 

(Fig. A.I). Consider the rectangle ADCD of area Ax Ay about 




+ 

Fie. A.l. 



the point (x, y). Within this rectangle will fall all those points 
representing value pairs (#,-, yj) for which 

x — ±Ax <x, <x + }A* 
and y — ±Ay < y/ < y + $Ay 

Let the number of these points be AN. The proportion of 
points inside the rectangle to the total number A ? in the diagram 
is then AN IN = Ap, say. bp, the relative frequency of the 
value-pairs falling within the rectangle A BCD, will clearly 

228 
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< 1. The average, or mean, relative frequency per unit area 
within this rectangle is ApjAA, where A/i = Ax . Ay. If wc 
now increase N, the sample size, indefinitely and, simul- 
taneously, reduce Ax and Ay, wc may write 

Limit ApjAA = dpIdA, 

Ar— ►() 
Ay — ► « 

which is now the relative-frequency density at (x, y) of the con- 
tinuous parent population. In this parent population, how- 
ever, the values of the variates are distributed according to 
some law, which may be expressed by saying that the relative- 
frequency density at (x, y) is a certain function of x and y, 
<f>(x, y), say. Thus 

dpjdA = <f>(x, y) or, in differentials, 

dp = <f>{x, y)dA = <f>(x, y)dxdy . . (A.l) 

Here dp is the relative-frequency with which the variate x 
assumes a value between x ± \dx, while, simultaneously, the 
variate y assumes a value between y ± •Jrfy. But, since the 
relative frequency of an event E converges stochastically, as 
the number of occurrences of its context-event tends to 
infinity, to the probability of Ji's occurrence in a single occur- 
rence of its context-event, we may say : 

dp is the probability that the variate x will assume a 
value between x J- \dx, while, simultaneously, the variate 
y assumes a value between y + \dy. Then <f>{x, y) is the 
joint probability function of x. y or the joint probability 
density of x, y. 

Now let the range of possible values of x be a ^ x b and that 
of y, c ^ y ^ d; then, since both * and y must each assume 
some value 

j \ fa, y)dxdy = 1 

or, if for values outside a ^ x ^ b, c ^ y ^ d, we define 
<j>[x, y) to be zero, we may write 

/ / + <f>(*.y)d*dy = 1 . . (A.2) 

— » — CO 
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It follows that the probability that A', > x > A'* and 

P{X, ^x^X t ,Y l ^y^Y t ) = ("[ 4fa y)dxdy 

.v, J i; (A.3) 
If x and v are statistically independent, i.e.. if the probability, 
dp, = <f>,{x)dx, of x taking a value between x ± \dx is indcpcnd- 




Fic. A.2. 

ent of the value taken by y, and if the probability, dp. 
~ <t> t [y)dy. of y taking a value'between y ± J<§ is independent 
of the value taken by x, by the law of multiplication of prob- 
abilities we have 

4>(*. y) = M*) • • • • (A.4) 

and all double integrals resolve into the product of two single 
integrals. For instance (A.3) becomes 

r s > r y < 

P(X l Js * A'„ Y, > y > Y x ) = J &(x)* . / ^(y)rfy 

■ft y. 
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Clearly variates for which (A.4) holds are uncorrelated. 

In Fig. A. 2 let the rectangle A BCD in the xOy plane be 
formed by the lines x = x ± ±dx, y = y + -Jrf)'. At every 
point P, (*, y), in this plane for which <f>(x, y) is defined, erect a 
perpendicular, z, of length tf>(x, y). Then as x and y vary over 
the xOy plane, 0 generates a surface I = <fr(x, y), the probability- 
or corrclation-sttrfacc. dp, the probability that * lies within 
at ± \dx and y within y ± \dy, is then represented by the 
volume of the right prism on A BCD as base below the correla- 
tion-surface. 

Moments. Let the AA r values of (.v, y) lying within the 
rectangle A*Ay of the scatter diagram of our sample of N 
value-pairs from a continuous bivariate population be con- 
sidered as " grouped " in this class-rectangle. Their product 
moment of order r, s about x = 0, y = 0, m r ,' is given by 



m„' — jr*V = ^*'y"±-4 

For the corresponding class rectangle of the continuous 
parent population, we have, accordingly, 

dp. 



u«' = j^x'ydA m <f>{x. y)x r ydxdy ; 
therefore, for the entire parent distribution, 

r+ «0 ^ + 0O 

V** = I I y)x T y"dxdy 



(A.5) 



— CO — oo 



I D particular : 



= i*io' = / J x4>{x.y)dxdy; 

— to — to , + x . + a 



— OC — M 



/ 1 ^(x.y)dxdy. 

— x — M r-r /■ -r « 

Ni' — / / y)<7*rfy; 

— x — to 

/• + « /• + » 



— x — oo 



(A.6) 
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The corresponding moments about the mean (x, y) of the 
distribution are : 



— SO — SO 

V ■ nos = J I b> - y)<f>{*. y)dxdy, 

— CO — 30 

cov (*, y) ^ + 1*> »• + *> 

B = ] J (x - *)(y - y)<l>(x, y)dxdy 



— 00 
• + ■> ,+ <* 



(A.7) 



Also, since / / <f>{x, y)dxdy = 1 (A. 2), we have : 

'-co '-co 

/+ «o /•+ "0 
/ (x* - 2xx + x 2 )<f>(x, y)dxdy 
— 00 •'-00 

/ + 0O-+0O . + CO . + CO 

/ jlfyfo y)</.»-fl> - 2-f / / *^(*. y)rf*,iy + ;f* 

- •= * m CO ■'-X ■'-00 

■ f» = Uto' - ** 1 
o/ = !i 0S = Hoj - y 1 J 



or <j,» ■ n, 0 = |x, 0 ' — x* 

and likewise 



Finally, 

■ -f so ^4-oo 



/f ao , -i- oo 
/ (*y — *y - + *?)<#*, y)d*rfy 
-B '- CO 

/+ 00 OO CO * (- CO 

/ xy<f>{x.y)dxdy -yl I x<j>(x. y)dxdy 
-co ■'-oo '-co ■'-co 

/+ *> /- + CO 
/ yV**B> + fy 

-00 — CO 

= Hit' - yS - *y + X? 
i.e., ct^, = |x„ = |x„' - Sy . . . (A.9) 

The moment-generating function, A/(/,, /.) of a bivariate 
continuous distribution is defined to be 
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m-f f exp (xl, + yt t )<f>(x. y)dxdy (A. 10) 

■'— en ■'—no 



' — m * — oo 



As the reader should verify, the moment of order r, s about 
* = 0, y = 0 is given by the coefficient of t x r l t "lr\ s! in the 
expansion of Af^j, /,). 

Regression and Correlation. Since the probability, df>, that 
x lies between at ± when y lies between y ± jdy is 
rf/> = njf*, y)dxdy, the probability that x lies between x ± \dx 
when y takes any value in its range is 

( + (ftx, y)dx)dy = rf* ( <t>(x, y)dy 

J-n '-co 



/• + « 

Now / t£(;r, y)rfy is a function of x, <f>i[x). say. and is the 

'-co 

relative frequency of the y's in the *-array. Likewise 
l (j>(x, y)dx — tf>i(y). is the probability that y lies between 

'-00 

y -j- \dy for any value of x, i.e., the relative frequency of the 
x's in the y-array. 

The mean of the y's in the *-array, yV, is, then, given by 



kfiW - 9m — / y-f>(x. 



y)rfy 



or 

Likewise, 



9* 



= J 



'" yftx.y) 



(AH) 



j 

Exercise : S/iok' Ma/ /Ac variance of the y's in the x-array {ay*), is 

'T«0, 



/ 



rfy 



Now the right-hand side of (A. 11) is a function of x, and. there- 
fore, 
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The equation of the curve of means of the x-arrays, i.e., the 
curve of regression of y on x is 

■ ■ ■ < a - ,2 > 

while the regression equation of x on y is 

■ ■ ■ **> 

If the regression of y on .r is linear, we must have 

r-£"^*-*+« ,a.u, 

Multiply this equation by $ t (x) and integrate over the whole 
-v-range; we have 

<■+» /••*-« <• + « 



f ( yft*. V)dxdy =A I x^(x)dx + B ( faWdx 
/ x<j>(x, y)dxdy + B / ftx. y)dxdy 

-m —to '-a, *-» 

'■c. n 01 '=^|/ lo '-B . . . (A. 15) 

Now multiply (A. 14) by x<f>,(x) and integrate, obtaining 

/ ( xy<f>(x, y)dxdy = A f ( *Y(*> y)dxdy 

J - X. ^ — X ''-00 

+ B /" +X r'xftx, y)dxdy 



i e., |X„' = /4|x„' -r Bu l0 ' . . . (A. 16) 

Solving (A. 15) and (A. 16) for A and B. we have 

A = (iiil^LJfioJ^i' = = 5» 
Hm' - (Hio') 1 "to 5? 

B = H oiVin' ~ M-ioVn' _ ftu'd^to + Hio' 1 ) - Hio'(U|i + iXioV,/) 
Hio' ~ ("-io')* ,u.,o 

Consequently, (A. 14) becomes 
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and the correlation coefficient between .r and y. p. is 

p » Sl ■ £11 , . . . (A.i8) 

It should be noted, however, that if we define p by (A. 18), 
this does not necessitate that the regression be linear. 

Finally, consider the standard error of estimate of }' from 
(A. 17) : since 

Otg'Ot' — [OxylaxOy) <Jj,/Oi = p(Iy/(Jx, 

/ \y -y -£>(x - x).*<f>(x,y)dxdy 

= fl Uy-yV- 2?^ (y - y)(.r - *) 

i.e.. S„» = c,*(l - p 1 ) (A. 19) 

The Bivariate Normal Distribution. Consider the bivariate 
distribution whose probability density is 

4,(x, y) = C exp (- Z) \ 
where C = llir.aji, (1 - p«)l -(A.20) 
and Z = {ar'/o,. 1 - 2p Iy x J f/o*o, + y*/o/H8( 1 - p J ) j 

The distribution so defined is called the Bivariate Xormal 
distribution. For the moment we shall regard a,, o„ and p as 
undelined constants. 

We begin by finding what are usually called the marginal 
distributions, viz., the probability function of x for any value or 
all values of y, and the probability function of y for any or all .v. 
The probability that x lies between x — \dx, whatever the value 
assumed by y, is given by 

+« 



$ x (x)dx = dx I <f>(x, y)dy 
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i.e., 4,(x) = C J exp (- g(1 l _ pi | [(pV/ox 1 - W** 

+ J*/a/) + (1 - tWtofW* 
= C exp {- x 1 i2a x t } x 

Put y = y/o y — p.v/o x . 

Then dy = cyiY and as y-> ± eo. Y->-ao. 
Hence, 

,(*) = Co, exp ( - ^/2o x ») / exp [- 2(1 - p ») ] 
But (see footnote to 5.4 (e), page 82) 



L 



exp !- y«/2(l - p*)]dY - [2tt(1 - p«)]l. 
Therefore 

^(*) = exp (- • • (A.2I) 

o x v2jc 

and, likewise, 

ayvm 

Thus, marginally, both at and y are normally distributed with 
zero mean and variances a** and 0/ respectively. Moreover, if 
we put p = 0 in (A.20) 

y) = — L= exp (- *«/2ox») . —5= exp (- y» ; 2o/) 

and thus we obtain a clue to the significance of p : when 
p = 0 the variates are uncorrclated. May not p then be the 
correlation coefficient of x and y ? That is indeed so and that 
var (x) and var ( y) arc respectively es? and o y * may be seen by 
considering the moment-generating function of the distribution. 
We have 

M(t lt /,) = C f f exp [tf, + y/ 2 - 2(1 ]_ rf t*lo* 
- 2p*y/c x o, + yVVl] A"^ (A.22) 
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The reader will verify that, if we make the substitutions 
X = x — a x (aA x + p<Vi) 
Y = y - o„(o^, + po^,). 

(A.22) becomes 
M<<„ /,) = exp Ma,Hf + SpexoM, + o v V)) x 

* sssd - p 'tJ *7 *" exp " 

' l p 1 j-* + y., V)/2 (i _ joust* 

But the second factor in the right-hand expression is equal to 
unity; therefore 

Jf(*„ »,) = exp [i(oxV + 2pcV x /, + V*V)] (A-23) 
But the exponential contains only squares of /, and I. and the 
product and, so, the coefficients of /, and /» in its 
expansion are zero, i.e., S = 0 = y, and M{t t , / 3 ) is the mean- 
moment generating function. Consequently 

V-tt = ox"; Pii = P<Jxa» or p = £j-±. ; m,, = d§* 

Thus, although up to now we have not considered o x , a s , and 
p to be other than undefined constants, they are in fact the 
standard deviations of * and y and the correlation coefficient 
of x and y respectively. 

Exercise: Show that : (i) ji„ = 0. when r + s is odd ; {») n ta = So/ ; 
Mil - 3po/o,; = (1 + 2p') (»,»«,»); Mls = 3po,a,»; Moi = 3a. 4 . 
Some other properties: (A.20) may be written 

Xfm y) = \ exp[- (y'/o/ - 2py.r/<vJx 

2t:oxO»(1 - p 1 )* + p»^.!/ 0x i/(2(i _ p «); . exp( - *»/2<Jx s ) 

= — -j— exp ( — x 1 /2a r t ) . * . x 

o,\/2n * o y (l - p»)i\/2jt 

X exp {- [(y - pj*)72°/(l -P s )]} 

But by (A. 19) S y » = c„ l (l - p s ), and. therefore, 

We see, then, that, if x is held constant, i.e., in any given x- 
array, the y's are distributed normally with mean y = p<jyx/a x 
and variance S„ J . 
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Thus the regression of y on .v is linear, the regression equation 
being 

ym&.B .... (A.24) 

and the variance in each array, being S, 2 , is constant. Conse- 
quently, regression for the bivariate normal distribution is 
homoscedasttc (see 6.4). 

SOME MATHEMATICAL SYMBOLS AiS T D 
THEIR MEANINGS 

cxp x = c*. exponential function, where e — 2-71828 . . . 
is base of natural logarithms. 

log x = log, x, natural logarithm. 

A* = small increment of x. 

Lt/(*) or 

* — ► a 

Limit f(x) = the limit of /(*) as x tends to a. 

X »d 

-> = tend to (limit). 
00 = infinity. 

n! = factorial w, n(n — I)(w — 2) . . . (3.2.1) 
Q mt>m tr-s)\ 

Tl 

£ xt = x t + x t + . . . + x„; sum of . . . 
f- 1 

in ti 

+ *« + *« + • • • + *m 

+ ...* + 

+ *ml +*«« + • • • + *™ 

n 

n xi = x l . x, . x 3 . . . *„_ 1 ■ x n ; product of . . . 

^= = approximately equal to. 
> ; < = greater than ; less than. 

^ ; ^ = greater than or equal to; less than or equal to. 

Population parameters are, in general, denoted by Greek 
letters; estimates of these parameters from a sample are 
denoted by the corresponding Koman letter. 
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